Re: [PR] [Parquet] Reduce one copy in `SerializedPageReader` [arrow-rs]

via GitHub Sat, 01 Nov 2025 03:38:59 -0700


alamb commented on PR #8745:
URL: https://github.com/apache/arrow-rs/pull/8745#issuecomment-3476176433


   > The regression is indeed related to memory allocation (brk, page faults, 
etc.), but I'm not sure if it's due to the read buffers being held. Those 
buffers are small and freed quickly.
   
   So in my mind the most recent benchmark results show a performance 
improvement for this branch
   
   In an ideal world, benchmark results would be 100% reproducible and free of 
noise. However, in the real world, especially on the "machine" I am using to 
benchmark (a VM) there are many sources of noise in the measurements:
   1. Random other processes (`apt-get update` for example) deciding to do 
their work during a run
   2. State of the kernel VM
   3. Hardware thermal state
   4. Other tenants running on the same machine
   5. etc
   
   So while I applaud our efforts here to be scientific, I also think it has 
passed the level of scrutiny needed
   
   From my perspective, the code after this PR is clearly doing less work, and 
shows improvements in the benchmarks (even if there is some noise), thus it is 
a net improvement over what is on main


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [Parquet] Reduce one copy in `SerializedPageReader` [arrow-rs]

Reply via email to