k8ika0s commented on PR #48203:
URL: https://github.com/apache/arrow/pull/48203#issuecomment-3568369472

   @Vishwanatha-HD Hey! Really appreciate you taking on the encoder/decoder 
paths for s390x — these two files are where a lot of the subtle BE issues first 
show up.
   
   One thing I ran into on real s390x hardware is that Arrow’s array buffers 
already store their scalars in canonical little-endian format. Because of that, 
per-value swapping inside the Plain/Arrow fast paths can sometimes lead to an 
unintended double-swap, especially when mixing Arrow-originated inputs with 
non-Arrow callers (e.g., DeltaBitPack or ByteStreamSplit feeding into the same 
decode path).
   
   A couple of spots I’m curious about here:
   
   **• PlainDecoder → Arrow fast path**  
   The `#if ARROW_LITTLE_ENDIAN` branches check out for host-native buffers, 
but Arrow itself always hands you LE data. Does this path avoid re-swapping 
values that are already canonical LE from Arrow? I’ve seen that cause subtle 
mismatches on BE when DeltaBitPack or BSS push Arrow-arrays directly into the 
decoder.
   
   **• PlainEncoder (primitive path)**  
   On BE, the per-value `ToLittleEndian` write works for correctness, though 
I’ve found cases where staging to a single LE scratch buffer helps avoid 
partial/mixed-endian outputs when builders and sinks run back-to-back.
   
   **• ByteStreamSplit**  
   Here the code assumes DoSplitStreams handles endianness, but BSS usually 
expects inputs to already be in canonical LE order before the streams are 
interleaved. With native-order buffers on BE, the shuffle sometimes produces 
different stats/dictionary bytes across architectures. Curious if you’ve tested 
mixed Arrow/non-Arrow inputs through this path?
   
   None of this blocks the PR — just sharing things I hit in BE testing across 
the encode/decode → stats → page-index chain.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to