sahuagin commented on PR #9787:
URL: https://github.com/apache/arrow-rs/pull/9787#issuecomment-4299583607

   Here are the updated benchmark results on the simplified terminal-skip 
branch against the `upstream` baseline. Summary first, details below.
   
   ### Summary
   
   The targeted paths — bw=0 miniblocks and terminal page skips — show 
consistent improvements on the 8/16/32-bit integer types. Results on 64-bit 
types are mixed, likely dominated by measurement variance (see note at the end).
   
   ### Wins on the bw=0 / terminal-skip paths (8/16/32-bit)
   
   `binary packed skip single value` (all-same → bw=0 hot path):
   
   | Type | Change |
   |---|---|
   | Int8Array | **−11.3%** |
   | UInt8Array | **−8.2%** |
   | Int16Array | **−6.9%** |
   | UInt16Array | **−9.9%** |
   | Int32Array | **−9.8%** |
   | UInt32Array | **−7.1%** |
   
   `binary packed skip stepped increasing value`:
   
   | Type | Change |
   |---|---|
   | Int8Array | **−8.7%** |
   | UInt8Array | **−5.7%** |
   | Int16Array | **−6.9%** |
   | UInt16Array | **−6.8%** |
   | Int32Array | **−5.5%** |
   | UInt32Array | **−4.8%** |
   | INT32/Decimal128Array | **−4.7%** |
   
   `binary packed skip, mandatory, no NULLs` also improves on Int8 (−2.1%), 
Int16 (−5.2%), UInt16 (−3.1%).
   
   ### Noise on unchanged paths
   
   The `optional, half NULLs` and `optional, no NULLs` variants don't exercise 
the paths touched by this change (they go through the non-terminal decode path, 
unchanged). They show ±3–8% variance in both directions across types — 
UInt32Array regresses 4–8%, Int8Array improves 2–4%, Int16Array is essentially 
flat. Consistent with measurement noise on the non-isolated machine I flagged 
in the prior run.
   
   ### Note on 64-bit variance
   
   Int64Array/UInt64Array results on this run were noisier than the smaller 
types, including +6.3% / +2.1% on `single value` and +5.1% / +7.2% on 
`increasing value`. I ran the #9786 (bw=0 only) branch against the same 
baseline on the same hardware and saw the expected **−18% to −21% wins on 
64-bit** for exactly those benchmarks, with similar wall-clock measurement 
conditions. That's good evidence the 64-bit numbers on this branch are 
measurement noise rather than a real regression — the underlying code paths 
that the bw=0 branch exercises are also exercised here, and the bw=0 branch's 
numbers are clean.
   
   Happy to rerun on more controlled hardware if that'd be useful.
   
   ### Measurement conditions
   
   Same disclaimer as the prior run: non-isolated machine, no CPU frequency 
pinning, browser tabs and assorted background processes. For the paths I 
targeted the signal is large enough to read through the noise; for paths I 
didn't touch I'd ignore the numbers on principle.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to