Re: [PR] fix(parquet): bound data page byte size for large variable-width values [arrow-rs]

via GitHub Thu, 04 Jun 2026 09:42:48 -0700


alamb commented on PR #9972:
URL: https://github.com/apache/arrow-rs/pull/9972#issuecomment-4624242804


   > @alamb I'm pretty convinced the benchmark regressions (e.g. [#9972 
(comment)](https://github.com/apache/arrow-rs/pull/9972#issuecomment-4617603557)
 `string/parquet_2` 1.75x) are just noise.
   > 
   > For example: `main` vs. `main` in [#9972 
(comment)](https://github.com/apache/arrow-rs/pull/9972#issuecomment-4618263660)
 `string/parquet_2` has an 18% swing. I ran a lot of testing on the exact same 
hardware as these bench runners and found that running any benchmark in 
isolation produces a <1% change on main vs. this branch. I measured instruction 
count and there was only a ~0.3% increase in instructions, not enough to cause 
a 70% performance impact. Running the entire suite I _was_ able to reproduce 
the regressions, although not consistently. This led me to investigate 
allocator impact.
   > 
   > The conclusion was that this is caused by rolling the dice on allocations, 
see #10068. These benches are all allocation heavy so any "bad luck" in 
ordering / timing of allocations (re-use vs. page fault) can result in big 
swings in results. With #10068 applied I see no regressions here even running 
the full suite.
   > 
   > My suggestion is that we don't over-index on this and move forward with 
this change if everything else looks good. If you are happy with #10068 we can 
also merge that and re-run the benches, but I'm afraid that would delay the 
release and I do think it'd be nice to include this change in the release.
   
   I agree with the plan to move forward -- I think you have done enough 
diligence to show this doesn't slow things down. Giving it a final check now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix(parquet): bound data page byte size for large variable-width values [arrow-rs]

Reply via email to