adamreeve commented on PR #46972: URL: https://github.com/apache/arrow/pull/46972#issuecomment-3043538730
Right OK that makes sense. I'm testing with plain encoded float columns so hadn't noticed that but yes it might be a good idea to also change the decompression buffers then. I did some rough benchmarks with `/usr/bin/time -v` and get the following results for my test case (what's described in #46971 but reading all row groups), taking the best out of three runs: | | System allocator | mimalloc | jemalloc | | --- | --- | --- | --- | | Baseline time (s) | 6.92 | 6.89 | 6.68 | | Time with temp decrypt buffers (s) | 8.23 | 6.00 | 6.42 | | Time with temp decrypt and decompress buffers (s) | 6.50 | 6.08 | 6.59 | | | System allocator | mimalloc | jemalloc | | --- | --- | --- | --- | | Baseline max RSS (MB) | 1,556 | 1,550 | 1,128 | | Max RSS with temp decrypt buffers (MB) | 894 | 891 | 627 | | Max RSS with temp decrypt and decompress buffers (MB) | 1,823 |890 | 629 | The behaviour with mimalloc and jemalloc looks good, but the results with the system allocator are quite concerning. The max RSS decreases if using temporary decryption buffers, but actually increases quite significantly when also using temporary decompression buffers. I'm not sure why that would be, maybe this causes more memory fragmentation? (C++ heap memory management is not something I know a lot about...). There is also a noticeable slow-down in the temporary decryption buffer case with the system allocator. Maybe this is acceptable, given most users will be using mimalloc? I also tested with unencrypted data: | | System allocator | mimalloc | jemalloc | | --- | --- | --- | --- | | Baseline time (s) | 4.07 | 4.09 | 3.93 | | Time with temp decompress buffers (s) | 4.08 | 4.25 | 3.98 | | | System allocator | mimalloc | jemalloc | | --- | --- | --- | --- | | Baseline max RSS (MB) | 884 | 895 | 627 | | Max RSS with temp decompress buffers (MB) | 954 | 913 | 660 | Based on these benchmarks alone, maybe only the decryption buffers should be temporary. But I've only tested with plain float data. I'll look into testing with more data types and encodings. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
