pchintar commented on PR #9836:
URL: https://github.com/apache/arrow-rs/pull/9836#issuecomment-4324417805

   Also, it is clear that the compressed path is much slower than 
non-compressed overall, so I took a look into the zstd compressed path and I 
found out that in `arrow-ipc/src/compression.rs`, the current `compress_zstd` 
function was doing:
   
   1. `compress()` → allocates a new `Vec`
   2. `extend_from_slice()` → copies into output
   
   That's one extra allocation + one extra copy per buffer. Zstd actually 
provides `compress_to_buffer()` which writes directly into an existing buffer. 
So, we can change current implementation from:
   
   # Current `compress_zstd`(alloc -> compress -> copy)
   
   ```rust
   #[cfg(feature = "zstd")]
   fn compress_zstd(
       input: &[u8],
       output: &mut Vec<u8>,
       context: &mut CompressionContext,
   ) -> Result<(), ArrowError> {
       let result = context.zstd_compressor().compress(input)?;
       output.extend_from_slice(&result);
       Ok(())
   }
   ```
   
   ---
   
   # AFTER/New approach (compress -> direct write)
   
   ```rust
   #[cfg(feature = "zstd")]
   fn compress_zstd(
       input: &[u8],
       output: &mut Vec<u8>,
       context: &mut CompressionContext,
   ) -> Result<(), ArrowError> {
       use zstd_safe::compress_bound;
   
       let compressor = context.zstd_compressor();
   
       // Compute maximum compressed size
       let bound = compress_bound(input.len());
   
       // Reserve space and extend buffer to allow in-place write
       let offset = output.len();
       output.resize(offset + bound, 0);
   
       // Compress directly into output buffer
       let written = compressor
           .compress_to_buffer(input, &mut output[offset..])
           .map_err(|e| ArrowError::ExternalError(Box::new(e)))?;
   
       // Truncate to actual compressed size
       output.truncate(offset + written);
   
       Ok(())
   }
   ```
   I'll make this modification later today and re-test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to