2010YOUY01 commented on PR #16512:
URL: https://github.com/apache/datafusion/pull/16512#issuecomment-3025966862

   > On my machine, avg bandwidth (throughput) for Q2 is
   > 
   > (uncompressed) plain: (w) 938.4 MB/s (r) 1042.7 MB/s zstd: (w) 234.6 MB/s 
(r) 276.0 MB/s lz4_frame: (w) 521.3 MB/s (r) 446.9 MB/s
   > 
   > Considering my SSD bandwidth is around 3.2GB/s, as you pointed out, it 
seems that there's room for optimization.
   > 
   > Plus, when I ran `strace -c -e trace=write,read cargo bench --bench 
spill_io compression` only for plain encoding, it seems like there are too many 
`read` system calls compared to `write`. We may need to investigate this more 
or find ways to reduce the number of `read` calls. ![Screenshot from 2025-06-30 
19-43-49](https://private-user-images.githubusercontent.com/88336128/460535933-33297fb4-89ff-49c9-9723-9ba0f78d7361.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTE0MTgxMDMsIm5iZiI6MTc1MTQxNzgwMywicGF0aCI6Ii84ODMzNjEyOC80NjA1MzU5MzMtMzMyOTdmYjQtODlmZi00OWM5LTk3MjMtOWJhMGY3OGQ3MzYxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA3MDIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNzAyVDAwNTY0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQ5YzFjMDBjNWQ2Yzg3Y2I1MDNmMjJmYzBhZGUxOGNkZmMwMDNiYmMwMjRmY2VhZj
 
E4MGZhZGE1YTExMTFhYjQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.sng8hZcRvzWrRUCA9_AwH8aQskDPNYdK0nBE1q2X2vU)
   
   Also, this read and write throughput ratio for lz4 and zstd also not look 
normal 🤔 
   Based on the compression benchmarks, the decompression speed should be 
several times faster than compression:
   https://github.com/lz4/lz4
   https://github.com/facebook/zstd


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to