iemejia opened a new pull request, #3534: URL: https://github.com/apache/parquet-java/pull/3534
## Summary - Rewrite `DeltaBinaryPackingValuesReader.unpackMiniBlock` to use `unpack32Values(byte[])` instead of per-group `unpack8Values(ByteBuffer)` + eliminate per-miniblock `ByteBuffer.slice()` allocations - Switch delta writers (int and long) to `pack32Values` for miniblock packing - Cache `BytePacker`/`BytePackerForLong` instances in per-instance arrays indexed by bit width ## Benchmark IntEncodingBenchmark (100k INT32 values, JMH -wi 3 -i 5 -f 1): ``` Benchmark Pattern Before (ops/s) After (ops/s) Improvement decodeDelta SEQUENTIAL 903,892,292 1,096,895,285 +21% (1.21x) decodeDelta RANDOM 364,659,977 410,632,530 +13% (1.13x) decodeDelta LOW_CARDINALITY 581,649,861 676,449,008 +16% (1.16x) decodeDelta HIGH_CARDINALITY 370,718,831 506,116,434 +37% (1.37x) encodeDelta SEQUENTIAL 556,155,088 558,868,426 flat encodeDelta RANDOM 360,327,834 376,594,239 +5% encodeDelta LOW_CARDINALITY 412,396,181 434,569,306 +5% encodeDelta HIGH_CARDINALITY 335,702,852 345,528,410 +3% ``` The decode path shows larger gains because it eliminates per-miniblock `ByteBuffer.slice()` allocations and uses the faster `byte[]` unpack path. Encode gains are modest because `pack32Values` is structurally similar to 4x `pack8Values` at this optimization level. All 576 parquet-column tests pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
