JigaoLuo commented on issue #8358: URL: https://github.com/apache/arrow-rs/issues/8358#issuecomment-3299058998
One thing I still don’t fully understand is **when—and for which data distributions—to apply specific compression algorithms.** - For example, could it be that Snappy doesn’t compress the encoded data much beyond what the encoding already achieves? At the same time, other schemes like ZSTD or GZIP in the search space might offer a higher compression ratio than Snappy? I’ve also noticed something that seems related: when I sort my Parquet file by a column, Snappy compression could not offer a nice compression ratio. (I have a solid grasp of encodings. But when it comes to compression, my understanding is still limited.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
