iemejia opened a new pull request, #3566: URL: https://github.com/apache/parquet-java/pull/3566
Part of #3530 — Apache Parquet Java Performance Improvements ## Summary Optimize dictionary encoding and decoding data structures. **Encoding**: - Replace `LinkedOpenHashMap` with `OpenHashMap` + `ArrayList` for all `DictionaryValuesWriter` subclasses, eliminating insertion-order linked-list overhead and enabling O(1) indexed access for dictionary page serialization and fallback. - Make `IntList.size()` O(1) by tracking `totalSize` incrementally instead of summing across slab arrays. **Decoding**: - Convert `PlainValuesDictionary` numeric constructors (INT32, INT64, FLOAT, DOUBLE) from `InputStream`-based per-byte reads to direct `ByteBuffer.getInt`/`getLong`/`getFloat`/`getDouble`. **Binary hashCode caching**: - Cache `hashCode()` for `Binary` instances not backed by reusable byte arrays, avoiding redundant recomputation during dictionary hash-map probes. JMH benchmarks: `DictionaryEncodingBenchmark`, `DictionaryDecodingBenchmark` with `TestDataFactory` and `BenchmarkEncodingUtils`. ## Benchmark results **Environment**: JDK 25.0.3 (Temurin), OpenJDK 64-Bit Server VM, JMH 1.37, Linux x86_64. Encoding (100K values/iteration, 2 averaged runs): | Benchmark | Baseline (M ops/s) | Optimized (M ops/s) | Speedup | |---|---:|---:|---:| | encodeInt HIGH_CARD | 14.9 | 23.5 | **1.58x** | | encodeLong HIGH_CARD | 12.0 | 19.2 | **1.60x** | | encodeFloat HIGH_CARD | 14.4 | 21.9 | **1.52x** | | encodeDouble HIGH_CARD | 11.7 | 17.9 | **1.53x** | | encodeBinary LOW len=10 | 75.6 | 125.6 | **1.66x** | | encodeBinary LOW len=100 | 13.2 | 107.8 | **8.2x** | | encodeBinary LOW len=1000 | 1.5 | 148.3 | **~100x** | | encodeBinary HIGH len=10 | 6.4 | 13.2 | **2.1x** | | encodeFlba HIGH len=12 | 6.3 | 15.4 | **2.4x** | | encodeFlba HIGH len=16 | 6.1 | 14.6 | **2.4x** | | Numeric LOW_CARD (all types) | ~120 | ~120 | ~1.0x | The extreme Binary LOW_CARD speedup (up to ~100x for len=1000) is due to eliminating `LinkedOpenHashMap` per-entry linked-list overhead, autoboxing, and `Binary.hashCode()` recomputation. With only ~100 distinct values in the hash map, the old code spent most time on `hashCode()` over the full key bytes at every probe. Decoding: ~1.0x across all types (the `ByteBuffer` constructor optimization is once per row group; per-value decode is an array index lookup and was not changed). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
