pchintar commented on code in PR #9778:
URL: https://github.com/apache/arrow-rs/pull/9778#discussion_r3127959522


##########
arrow-ipc/src/reader.rs:
##########
@@ -875,8 +875,12 @@ fn read_block<R: Read + Seek>(mut reader: R, block: 
&Block) -> Result<Buffer, Ar
     let metadata_len = block.metaDataLength().to_usize().unwrap();
     let total_len = body_len.checked_add(metadata_len).unwrap();
 
-    let mut buf = MutableBuffer::from_len_zeroed(total_len);
-    reader.read_exact(&mut buf)?;
+    let mut buf = MutableBuffer::with_capacity(total_len);
+    // Buffer is immediately fully initialized by `read_exact` before any read 
occurs

Review Comment:
   Thanks @jhorstmann  and @alamb  — I revisited this and reworked my approach 
to avoid `unsafe` entirely.
   
   The earlier `set_len + read_exact` pattern is not sound for generic `Read`. 
A fully safe alternative must also account for alignment: Arrow buffers require 
64-byte alignment (`arrow_buffer::buffer::ALIGNMENT`), while `Vec<u8>` does not 
guarantee this.
   
   To handle both safely:
   
   * read into `Vec<u8>` via `take(...).read_to_end(...)` (works for all `Read`)
   * reuse the allocation only if it is 64-byte aligned
   * otherwise copy into `MutableBuffer::from_len_zeroed(...)` to ensure proper 
alignment
   
   To avoid overhead on small reads, my implementation retains the existing 
path for smaller buffer sizes and applies the optimized path only to larger 
buffers (8192 bytes or higher), which constitute the majority of cases in the 
benchmark.
   
   ### Benchmark (official `ipc_reader`)
   
   ```bash
   cargo bench -p arrow-ipc --bench ipc_reader --features zstd
   ```
   
   ```
   StreamReader/read_10              ~910 µs → ~772 µs   (~15% faster)
   StreamReader/no_validation        ~441 µs → ~300 µs   (~32% faster)
   
   FileReader/read_10                ~900 µs → ~773 µs   (~14% faster)
   FileReader/no_validation          ~576 µs → ~300 µs   (~48% faster)
   
   zstd cases: small but consistent improvements
   mmap: unchanged or slightly improved
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to