iemejia opened a new pull request, #3566:
URL: https://github.com/apache/parquet-java/pull/3566

   Part of #3530 — Apache Parquet Java Performance Improvements
   
   ## Summary
   
   Optimize dictionary encoding and decoding data structures.
   
   **Encoding**:
   - Replace `LinkedOpenHashMap` with `OpenHashMap` + `ArrayList` for all 
`DictionaryValuesWriter` subclasses, eliminating insertion-order linked-list 
overhead and enabling O(1) indexed access for dictionary page serialization and 
fallback.
   - Make `IntList.size()` O(1) by tracking `totalSize` incrementally instead 
of summing across slab arrays.
   
   **Decoding**:
   - Convert `PlainValuesDictionary` numeric constructors (INT32, INT64, FLOAT, 
DOUBLE) from `InputStream`-based per-byte reads to direct 
`ByteBuffer.getInt`/`getLong`/`getFloat`/`getDouble`.
   
   **Binary hashCode caching**:
   - Cache `hashCode()` for `Binary` instances not backed by reusable byte 
arrays, avoiding redundant recomputation during dictionary hash-map probes.
   
   JMH benchmarks: `DictionaryEncodingBenchmark`, `DictionaryDecodingBenchmark` 
with `TestDataFactory` and `BenchmarkEncodingUtils`.
   
   ## Benchmark results
   
   **Environment**: JDK 25.0.3 (Temurin), OpenJDK 64-Bit Server VM, JMH 1.37, 
Linux x86_64.
   
   Encoding (100K values/iteration, 2 averaged runs):
   
   | Benchmark | Baseline (M ops/s) | Optimized (M ops/s) | Speedup |
   |---|---:|---:|---:|
   | encodeInt HIGH_CARD | 14.9 | 23.5 | **1.58x** |
   | encodeLong HIGH_CARD | 12.0 | 19.2 | **1.60x** |
   | encodeFloat HIGH_CARD | 14.4 | 21.9 | **1.52x** |
   | encodeDouble HIGH_CARD | 11.7 | 17.9 | **1.53x** |
   | encodeBinary LOW len=10 | 75.6 | 125.6 | **1.66x** |
   | encodeBinary LOW len=100 | 13.2 | 107.8 | **8.2x** |
   | encodeBinary LOW len=1000 | 1.5 | 148.3 | **~100x** |
   | encodeBinary HIGH len=10 | 6.4 | 13.2 | **2.1x** |
   | encodeFlba HIGH len=12 | 6.3 | 15.4 | **2.4x** |
   | encodeFlba HIGH len=16 | 6.1 | 14.6 | **2.4x** |
   | Numeric LOW_CARD (all types) | ~120 | ~120 | ~1.0x |
   
   The extreme Binary LOW_CARD speedup (up to ~100x for len=1000) is due to 
eliminating `LinkedOpenHashMap` per-entry linked-list overhead, autoboxing, and 
`Binary.hashCode()` recomputation. With only ~100 distinct values in the hash 
map, the old code spent most time on `hashCode()` over the full key bytes at 
every probe.
   
   Decoding: ~1.0x across all types (the `ByteBuffer` constructor optimization 
is once per row group; per-value decode is an array index lookup and was not 
changed).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to