pchintar opened a new pull request, #9836:
URL: https://github.com/apache/arrow-rs/pull/9836

   # Which issue does this PR close?
   
   - Closes #9835 .
   
   # Rationale for this change
   
   The current IPC writer path performs repeated heap allocations and buffer 
copies for every record batch, even when writing batches with identical 
structure. In particular, the use of `EncodedData` forces:
   
   * allocation of new `Vec<u8>` buffers per batch
   * copying of flatbuffer data via `to_vec()`
   * destruction of all intermediate buffers after each write
   
   This leads to unnecessary latency overhead in common scenarios such as 
streaming or iterative batch writes. The issue is purely performance-related 
and does not affect correctness.
   
   # What changes are included in this PR?
   
   This PR introduces a private, writer-specific fast path for non-dictionary 
batches in `StreamWriter` and `FileWriter`:
   
   * Reuses a writer-owned scratch structure across batches:
   
     * `FlatBufferBuilder`
     * `arrow_data` buffer
     * metadata vectors (`nodes`, `buffers`, `variadic_buffer_counts`)
   * Avoids copying flatbuffer data by writing directly from `finished_data()`
   * Bypasses the `EncodedData` path for non-dictionary batches while 
preserving it for dictionary cases
   * Adds a helper to detect dictionary usage and safely fall back to the 
existing implementation when needed
   
   # Are these changes tested?
   
   Yes.
   
   * Existing IPC writer tests (`cargo test -p arrow-ipc --lib writer`) pass 
without modification, confirming correctness and compatibility.
   * Benchmark results (`ipc_writer.rs`) show significant improvements(30-45%) 
for non-compressed workloads
   
   # Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to