iemejia opened a new pull request, #3569: URL: https://github.com/apache/parquet-java/pull/3569
Part of #3530 — Apache Parquet Java Performance Improvements ## Summary Optimize scalar encode/decode for the BYTE_STREAM_SPLIT encoding. **Reader**: Specialized transpose loops for element sizes 2/4/8/12/16 bytes plus generic fallback. Bulk array access when backing array is available. **Writer**: Batched scatter buffers (`int[]`/`long[]` batches of 64) replacing per-value `scatterBytes()` which allocated temp `byte[]` and issued N single-byte writes. Includes unit tests for transpose specializations, batch-boundary crossing, `getBufferedSize` with partial batches, direct ByteBuffer decode paths, and close/reset with pending unflushed batches. JMH benchmarks: `BssEncodingBenchmark`, `BssDecodingBenchmark` covering FLOAT, DOUBLE, INT32, INT64, and FIXED_LEN_BYTE_ARRAY. ## Benchmark results **Environment**: JDK 25.0.3 (Temurin), OpenJDK 64-Bit Server VM, JMH 1.37, Linux x86_64. Decoding: | Benchmark | Baseline (M ops/s) | Optimized (M ops/s) | Speedup | |---|---:|---:|---:| | decodeInt | 203 | 1,034 | **5.1x** | | decodeFloat | 263 | 1,032 | **3.9x** | | decodeDouble | 132 | 363 | **2.8x** | | decodeLong | 133 | 365 | **2.7x** | | decodeFlba(2) | 286 | 491 | **1.7x** | | decodeFlba(12) | 95 | 179 | **1.9x** | | decodeFlba(16) | 78 | 142 | **1.8x** | Encoding: | Benchmark | Baseline (M ops/s) | Optimized (M ops/s) | Speedup | |---|---:|---:|---:| | encodeDouble | 53 | 365 | **6.9x** | | encodeLong | 52 | 356 | **6.9x** | | encodeInt | 99 | 515 | **5.2x** | | encodeFloat | 101 | 499 | **5.0x** | | encodeFlba(16) | 32 | 95 | **3.0x** | | encodeFlba(12) | 41 | 114 | **2.8x** | | encodeFlba(7) | 69 | 166 | **2.4x** | | encodeFlba(2) | 192 | 314 | **1.6x** | Every benchmark shows clear improvement with no regressions. 8-byte types benefit most from the batched scatter (6.9x) since the baseline scattered 8 bytes per value into 8 separate streams. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
