iemejia opened a new pull request, #3568: URL: https://github.com/apache/parquet-java/pull/3568
Part of #3530 — Apache Parquet Java Performance Improvements ## Summary Optimize the scalar hot path of the RLE/Bit-Packing hybrid codec used for repetition levels, definition levels, and dictionary-index pages. **Decoder**: InputStream to ByteBuffer migration, lazy-grow buffer reuse with zero-copy unpacking, `unpack32Values` fast path. **Encoder**: `pack32Values` fast path, deduplicated flush logic. **Adapted consumers**: `DictionaryValuesReader`, `ColumnReaderBase`, and `RunLengthBitPackingHybridValuesReader` updated for `ByteBuffer`-based API. JMH benchmarks: `RleEncodingBenchmark`, `RleDecodingBenchmark`, `RleDictionaryIndexDecodingBenchmark` (5 bit widths x 4 patterns). ## Benchmark results **Environment**: JDK 25.0.3 (Temurin), OpenJDK 64-Bit Server VM, JMH 1.37, Linux x86_64. Dictionary index decoding (direct decoder, 5 bit widths x 3 packed patterns): | Category | Avg Improvement | Range | |---|---:|---| | Direct decoder (packed data) | **+30.4%** | +20.5% to +47.2% | | Via ValuesReader wrapper | **+11.8%** | +4.5% to +21.4% | | Encoder (packed data) | **+4.9%** | -1.3% to +15.0% | | Boolean decode (packed patterns) | **+13.6%** | +6.4% to +25.9% | Selected data points: | bitWidth | Pattern | Master (M ops/s) | Branch (M ops/s) | Delta | |---|---|---:|---:|---| | 4 | SEQUENTIAL | 605 | 848 | **+40.1%** | | 16 | SEQUENTIAL | 510 | 750 | **+47.2%** | | 8 | RANDOM | 616 | 812 | **+31.7%** | | 1 (bool) | ALTERNATING | 630 | 793 | **+25.9%** | Encoder improvements are modest (+5% avg) because `pack32Values` helps most at mid-range bit widths; at bitWidth=8, pack/unpack reduces to byte copies. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
