[PR] Avoid zero-filling IPC reads with typed buffer handling [arrow-rs]

via GitHub Wed, 13 May 2026 19:03:23 -0700


pchintar opened a new pull request, #9971:
URL: https://github.com/apache/arrow-rs/pull/9971


   # Which issue does this PR close?
   
   - Closes #9777 .
   
   # Rationale for this change
   
   This PR is a follow-up to the alignment concerns raised in #9778  when using 
`Vec<u8>` for IPC body reads to replace the current 
`MutableBuffer::from_len_zeroed` in IPC Reader.
   
   My [earlier approach](https://github.com/apache/arrow-rs/pull/9778/changes) 
showed that reading directly into `Vec<u8>` could substantially reduce 
redundant zero-filling in IPC reader paths, but some decode paths still relied 
on fixed-width typed buffers that could require additional alignment handling 
cost later during array construction.
   
   This PR keeps the `Vec<u8>`-based read path for IPC message and block 
bodies, while adding typed IPC buffer handling for fixed-width physical buffers 
before array construction.
   
   This preserves the existing alignment behavior for those fixed-width decode 
paths while avoiding the additional alignment handling/copying costs that could 
otherwise occur later during array construction.
   
   The typed-buffer handling now covers:
   
   * primitive and primitive-like arrays
   * binary/string offset buffers
   * list and list-view offsets/sizes
   * dictionary index buffers
   * union type id and offset buffers
   * view buffers
   
   These paths now read their physical buffers through 
`next_typed_buffer::<T>()` so the expected physical buffer lengths are derived 
from the native value type before array construction.
   
   Container types such as `Struct`, `FixedSizeList`, `RunEndEncoded`, and 
similar nested/container arrays were intentionally left on their existing 
decode paths because they do not directly own fixed-width value buffers at that 
level. Their child arrays continue to decode recursively through the updated 
typed-buffer paths where applicable.
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is 
sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are these changes tested?
   
   Yes.
   
   The existing IPC reader test suite was run with:
   
   ```bash
   cargo test -p arrow-ipc --lib
   ```
   
   IPC reader benchmark was also run with:
   
   ```bash
   cargo bench -p arrow-ipc --bench ipc_reader --features zstd
   ```
   
   The non-compressed, non-mmap IPC reader paths showed consistent improvements 
locally. Compressed and mmap-heavy paths were mostly neutral, as expected.
   
   # Are there any user-facing changes?
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Avoid zero-filling IPC reads with typed buffer handling [arrow-rs]

Reply via email to