[PR] GH-3522: Optimize delta binary packing with batch unpack32/pack32 and cached packers (+13-37% decode) [parquet-java]

via GitHub Fri, 01 May 2026 16:08:39 -0700


iemejia opened a new pull request, #3534:
URL: https://github.com/apache/parquet-java/pull/3534


   ## Summary
   
   - Rewrite `DeltaBinaryPackingValuesReader.unpackMiniBlock` to use 
`unpack32Values(byte[])` instead of per-group `unpack8Values(ByteBuffer)` + 
eliminate per-miniblock `ByteBuffer.slice()` allocations
   - Switch delta writers (int and long) to `pack32Values` for miniblock packing
   - Cache `BytePacker`/`BytePackerForLong` instances in per-instance arrays 
indexed by bit width
   
   ## Benchmark
   
   IntEncodingBenchmark (100k INT32 values, JMH -wi 3 -i 5 -f 1):
   
   ```
   Benchmark     Pattern           Before (ops/s)   After (ops/s)   Improvement
   decodeDelta   SEQUENTIAL          903,892,292   1,096,895,285    +21% (1.21x)
   decodeDelta   RANDOM              364,659,977     410,632,530    +13% (1.13x)
   decodeDelta   LOW_CARDINALITY     581,649,861     676,449,008    +16% (1.16x)
   decodeDelta   HIGH_CARDINALITY    370,718,831     506,116,434    +37% (1.37x)
   encodeDelta   SEQUENTIAL          556,155,088     558,868,426    flat
   encodeDelta   RANDOM              360,327,834     376,594,239    +5%
   encodeDelta   LOW_CARDINALITY     412,396,181     434,569,306    +5%
   encodeDelta   HIGH_CARDINALITY    335,702,852     345,528,410    +3%
   ```
   
   The decode path shows larger gains because it eliminates per-miniblock 
`ByteBuffer.slice()` allocations and uses the faster `byte[]` unpack path. 
Encode gains are modest because `pack32Values` is structurally similar to 4x 
`pack8Values` at this optimization level.
   
   All 576 parquet-column tests pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] GH-3522: Optimize delta binary packing with batch unpack32/pack32 and cached packers (+13-37% decode) [parquet-java]

Reply via email to