Re: [I] Tuning knobs to tradeoff CPU and compression in parquet [arrow-rs]

via GitHub Tue, 16 Sep 2025 07:34:57 -0700


JigaoLuo commented on issue #8358:
URL: https://github.com/apache/arrow-rs/issues/8358#issuecomment-3299058998


   One thing I still don’t fully understand is **when—and for which data 
distributions—to apply specific compression algorithms.** 
   - For example, could it be that Snappy doesn’t compress the encoded data 
much beyond what the encoding already achieves? At the same time, other schemes 
like ZSTD or GZIP in the search space might offer a higher compression ratio 
than Snappy?
   
   I’ve also noticed something that seems related: when I sort my Parquet file 
by a column, Snappy compression could not offer a nice compression ratio.
   
   (I have a solid grasp of encodings. But when it comes to compression, my 
understanding is still limited.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Tuning knobs to tradeoff CPU and compression in parquet [arrow-rs]

Reply via email to