adamreeve commented on PR #46972:
URL: https://github.com/apache/arrow/pull/46972#issuecomment-3051111325

   I looked at benchmarking unencrypted int32 data with nulls and using a 
non-plain encoding (delta binary packed). Otherwise the data layout is the same 
as my previous tests. Making the decompression buffers temporary decreases the 
max RSS with the system allocator, which is a bit surprising to me. But there 
is a slight increase in RSS and time taken with mimalloc and jemalloc.
   
   |  | System allocator | mimalloc | jemalloc |
   | --- | --- | --- | --- |
   | Baseline time (s) | 10.82 | 11.02 | 10.62 |
   | Time with temp decompress buffers (s) | 10.93 | 11.53 | 10.96 |
   
   | | System allocator | mimalloc | jemalloc |
   | --- | --- | --- | --- |
   | Baseline max RSS (MB) | 1,235 | 1,047 | 891 |
   | Max RSS with temp decompress buffers (MB) | 1,065 | 1,085 | 902 |
   
   I also looked at the memory allocations with massif, and the peak heap size 
is exactly the same, and is still dominated by the decompression buffers. 
Although my comment about them being referenced by the record batches isn't 
correct, the page buffers are still held in memory by the column reader, and 
then batches of data are decoded incrementally to Arrow arrays.
   
   So I don't think there's much reason to make the decompression buffers 
temporary and performance is generally a bit better if only the decryption 
buffers are temporary. I'm going to revert this PR back to only change the 
decryption buffers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to