pitrou commented on code in PR #48203:
URL: https://github.com/apache/arrow/pull/48203#discussion_r2556419213


##########
cpp/src/parquet/decoder.cc:
##########
@@ -2306,6 +2345,17 @@ class ByteStreamSplitDecoder<FLBAType> : public 
ByteStreamSplitDecoderBase<FLBAT
     const int num_decoded = this->DecodeRaw(decode_out, max_values);
     DCHECK_EQ(num_decoded, max_values);
 
+#if !ARROW_LITTLE_ENDIAN
+    // On big-endian, ByteStreamSplitDecode (DoMergeStreams) reverses stream 
positions
+    // to produce numeric values in native byte order. For FLBA (opaque byte 
arrays),
+    // we need to undo this reversal to preserve the original byte sequence.

Review Comment:
   Hmm, doing a second byteswap to undo the first one sounds entirely wasteful. 
Perhaps the ByteStreamSplit routines should instead take an optional flag to do 
the byte-swapping?
   
   For example have something like:
   ```c++
   // If fix_endianness is `true`, output will be converted from little endian 
to native endian
   inline void ByteStreamSplitDecode(const uint8_t* data, int width, int64_t 
num_values,
                                     int64_t stride, bool fix_endianness, 
uint8_t* out) {
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to