iemejia opened a new pull request, #3568:
URL: https://github.com/apache/parquet-java/pull/3568

   Part of #3530 — Apache Parquet Java Performance Improvements
   
   ## Summary
   
   Optimize the scalar hot path of the RLE/Bit-Packing hybrid codec used for 
repetition levels, definition levels, and dictionary-index pages.
   
   **Decoder**: InputStream to ByteBuffer migration, lazy-grow buffer reuse 
with zero-copy unpacking, `unpack32Values` fast path.
   
   **Encoder**: `pack32Values` fast path, deduplicated flush logic.
   
   **Adapted consumers**: `DictionaryValuesReader`, `ColumnReaderBase`, and 
`RunLengthBitPackingHybridValuesReader` updated for `ByteBuffer`-based API.
   
   JMH benchmarks: `RleEncodingBenchmark`, `RleDecodingBenchmark`, 
`RleDictionaryIndexDecodingBenchmark` (5 bit widths x 4 patterns).
   
   ## Benchmark results
   
   **Environment**: JDK 25.0.3 (Temurin), OpenJDK 64-Bit Server VM, JMH 1.37, 
Linux x86_64.
   
   Dictionary index decoding (direct decoder, 5 bit widths x 3 packed patterns):
   
   | Category | Avg Improvement | Range |
   |---|---:|---|
   | Direct decoder (packed data) | **+30.4%** | +20.5% to +47.2% |
   | Via ValuesReader wrapper | **+11.8%** | +4.5% to +21.4% |
   | Encoder (packed data) | **+4.9%** | -1.3% to +15.0% |
   | Boolean decode (packed patterns) | **+13.6%** | +6.4% to +25.9% |
   
   Selected data points:
   
   | bitWidth | Pattern | Master (M ops/s) | Branch (M ops/s) | Delta |
   |---|---|---:|---:|---|
   | 4 | SEQUENTIAL | 605 | 848 | **+40.1%** |
   | 16 | SEQUENTIAL | 510 | 750 | **+47.2%** |
   | 8 | RANDOM | 616 | 812 | **+31.7%** |
   | 1 (bool) | ALTERNATING | 630 | 793 | **+25.9%** |
   
   Encoder improvements are modest (+5% avg) because `pack32Values` helps most 
at mid-range bit widths; at bitWidth=8, pack/unpack reduces to byte copies.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to