iemejia opened a new pull request, #3565: URL: https://github.com/apache/parquet-java/pull/3565
Part of #3530 — Apache Parquet Java Performance Improvements ## Summary Replace `ByteBufferInputStream` and `LittleEndianDataInputStream` wrappers with direct `ByteBuffer` access for all PLAIN value readers and writers. **Readers** (`PlainValuesReader`, `BooleanPlainValuesReader`, `BinaryPlainValuesReader`, `FixedLenByteArrayPlainValuesReader`): hold a little-endian `ByteBuffer` from `initFromPage()` and call `getInt`/`getLong`/`getFloat`/`getDouble` directly, eliminating per-value stream overhead. **Writers** (`PlainValuesWriter`, `BooleanPlainValuesWriter`, `FixedLenByteArrayPlainValuesWriter`): write through `CapacityByteArrayOutputStream`'s new `writeInt`/`writeLong` methods which put values directly into the NIO slab buffer in little-endian order, avoiding temporary byte-array allocation. **Supporting changes**: - `CapacityByteArrayOutputStream`: allocate slabs with `ByteOrder.LITTLE_ENDIAN`, add `writeInt(int)` and `writeLong(long)` for single-value NIO writes. - `BytesInput`: add zero-copy `writeTo(ByteBuffer)` and `toByteArray()` using bulk `ByteBuffer.get()` instead of stream copy. - `LittleEndianDataOutputStream`: batch single-byte writes into single `write(buf, 0, N)` calls for `writeShort`/`writeInt`. Includes JMH benchmarks (`PlainEncodingBenchmark`, `PlainDecodingBenchmark`) covering all 7 primitive types for both encoding and decoding. ## Benchmark results **Environment**: JDK 25.0.3 (Temurin), OpenJDK 64-Bit Server VM, JMH 1.37, Linux x86_64. Decoding (100K values/iteration, 3 forks x 5 iterations, throughput mode): | Benchmark | Master (M ops/s) | Branch (M ops/s) | Speedup | |---|---:|---:|---:| | decodeInt | 425 | 5,427 | **12.8x** | | decodeFloat | 416 | 5,440 | **13.1x** | | decodeLong | 119 | 4,720 | **39.5x** (\*) | | decodeDouble | 116 | 6,026 | **51.8x** (\*) | | decodeBoolean | 639 | 1,642 | **2.6x** | | decodeFlba (len=2,12,16) | 188 | 680 | **3.6x** | | decodeBinary (len=10,100,1000) | 142 | 225-230 | **1.6x** | Encoding: | Benchmark | Master (M ops/s) | Branch (M ops/s) | Speedup | |---|---:|---:|---:| | encodeInt | 148 | 559 | **3.8x** | | encodeFloat | 150 | 532 | **3.5x** | | encodeLong | 193 | 478 | **2.5x** | | encodeDouble | 179 | 439 | **2.4x** | | encodeBoolean | 850 | 1,692 | **2.0x** | | encodeBinary (len=10) | 76 | 150 | **2.0x** | | encodeFlba (len=2-16) | 156-184 | 178-224 | **1.1-1.2x** | (\*) decodeLong/Double show JIT variance across forks (error bars >20%); true steady-state likely ~13x consistent with INT32/FLOAT. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
