2010YOUY01 commented on PR #16512: URL: https://github.com/apache/datafusion/pull/16512#issuecomment-3025966862
> On my machine, avg bandwidth (throughput) for Q2 is > > (uncompressed) plain: (w) 938.4 MB/s (r) 1042.7 MB/s zstd: (w) 234.6 MB/s (r) 276.0 MB/s lz4_frame: (w) 521.3 MB/s (r) 446.9 MB/s > > Considering my SSD bandwidth is around 3.2GB/s, as you pointed out, it seems that there's room for optimization. > > Plus, when I ran `strace -c -e trace=write,read cargo bench --bench spill_io compression` only for plain encoding, it seems like there are too many `read` system calls compared to `write`. We may need to investigate this more or find ways to reduce the number of `read` calls.  Also, this read and write throughput ratio for lz4 and zstd also not look normal 🤔 Based on the compression benchmarks, the decompression speed should be several times faster than compression: https://github.com/lz4/lz4 https://github.com/facebook/zstd -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org