pchintar commented on PR #9836:
URL: https://github.com/apache/arrow-rs/pull/9836#issuecomment-4330216954
Hi @alamb,
So, I took a closer look at the `ipc_writer` benchmark & zstd path and the
main cost seems to come from repeated calls to:
```rust
compress_to_vec(buffer, ...)
```
Right now the flow is strictly serial:
```text
write_array_data
→ write_buffer
→ compress_to_vec (zstd)
```
i.e.
```text
buffer1 → compress → write
buffer2 → compress → write
...
```
Since buffers are independent, I’m considering restructuring this to:
```text
collect buffers → compress in parallel → write in order
```
Conceptually:
```text
[buffer1, buffer2, buffer3]
↓
parallel compress
↓
append results (same order)
```
This would keep the same IPC format and compression behavior, while avoiding
any output-size tradeoff.
Implementation-wise, something like:
```rust
let parallelism = thread::available_parallelism()
.map(|n| n.get())
.unwrap_or(1)
.min(4);
```
Then process bounded chunks:
```rust
for chunk in pending_buffers.chunks(parallelism) {
let compressed = compress_chunk_in_parallel(chunk)?;
append_in_original_order(compressed)?;
}
```
where each worker owns its compression context:
```rust
let mut ctx = CompressionContext::default();
codec.compress_to_vec(buffer.as_slice(), &mut out, &mut ctx)?;
```
Would this kind of bounded per-batch parallelism be acceptable in
`arrow-ipc`, or would it introduce any new hidden costs?
Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]