[PR] GH-3505: Optimize ByteStreamSplitValuesReader page transposition [parquet-java]

via GitHub Sun, 19 Apr 2026 11:47:45 -0700


iemejia opened a new pull request, #3506:
URL: https://github.com/apache/parquet-java/pull/3506


   ### Rationale for this change
   
   `ByteStreamSplitValuesReader` is the symmetric reader for 
`BYTE_STREAM_SPLIT`-encoded `FLOAT`, `DOUBLE`, `INT32`, and `INT64` columns. On 
`initFromPage` it eagerly transposes the entire page from stream-split layout 
(`elementSizeInBytes` separate streams of `valuesCount` bytes each) back to 
interleaved layout. The current loop is:
   
   ```java
   private byte[] decodeData(ByteBuffer encoded, int valuesCount) {
     byte[] decoded = new byte[encoded.limit()];
     int destByteIndex = 0;
     for (int srcValueIndex = 0; srcValueIndex < valuesCount; ++srcValueIndex) {
       for (int stream = 0; stream < elementSizeInBytes; ++stream, 
++destByteIndex) {
         decoded[destByteIndex] = encoded.get(srcValueIndex + stream * 
valuesCount);
       }
     }
     return decoded;
   }
   ```
   
   Two issues on the hot path:
   
   1. Every read goes through `ByteBuffer.get(int)` (per-call bounds checks + 
virtual dispatch through `HeapByteBuffer`/`DirectByteBuffer`).
   2. The inner stream offset (`stream * valuesCount`) is recomputed on every 
iteration even though it depends only on the outer loop.
   
   For a 100k-value `FLOAT` page that is 400k `ByteBuffer.get(int)` calls; for 
`DOUBLE`/`LONG` it is 800k.
   
   ### What changes are included in this PR?
   
   Rewrite `decodeData` in three steps:
   
   1. **Drop down to a `byte[]` view** of the encoded buffer. When 
`encoded.hasArray()` is true (the typical case), use the backing array directly 
with the correct base offset; otherwise copy once with a single `get(byte[])` 
call. Eliminates the per-byte `ByteBuffer.get(int)` bounds check and virtual 
dispatch.
   
   2. **Specialize loops for the common element sizes (4 and 8)**. Hoist all 
`stream * valuesCount` offsets into local ints (`s0..s3` for floats/ints, 
`s0..s7` for doubles/longs) and write each output slot exactly once in a single 
sequential pass. Reads come from `elementSizeInBytes` concurrent sequential 
streams, which modern hardware prefetchers handle well.
   
   3. **Generic fallback** for arbitrary element sizes (`FIXED_LEN_BYTE_ARRAY` 
of any width).
   
   ### Benchmark
   
   New `ByteStreamSplitDecodingBenchmark` (100k values per invocation, JDK 18, 
JMH `-wi 5 -i 10 -f 3`, 30 samples per row):
   
   | Type   | Before  | After    | Δ              |
   |--------|--------:|---------:|---------------:|
   | Float  | 47.80M  | 162.29M  | **+240% (3.40x)** |
   | Double | 26.32M  | 66.00M   | **+151% (2.51x)** |
   | Int    | 47.07M  | 162.18M  | **+245% (3.45x)** |
   | Long   | 26.80M  | 66.00M   | **+146% (2.46x)** |
   
   Decoded output is byte-identical to before; per-op heap allocation is 
unchanged.
   
   ### Are these changes tested?
   
   Yes. All 573 `parquet-column` tests pass; 51 BSS-specific tests pass (`mvn 
test -pl parquet-column -Dtest='*ByteStreamSplit*'`). No new test was added 
because the decoded bytes are unchanged (covered by existing round-trip and 
`ByteStreamSplitValuesReaderTest` tests).
   
   ### Are there any user-facing changes?
   
   No. Only an internal reader optimization. No public API, file format, or 
configuration change.
   
   ### Closes #3505
   
   Symmetric companion to #3504 (writer-side BSS optimization). Part of a small 
series of focused performance PRs from work in 
[parquet-perf](https://github.com/iemejia/parquet-perf). Previous: #3494, 
#3496, #3500, #3504.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] GH-3505: Optimize ByteStreamSplitValuesReader page transposition [parquet-java]

Reply via email to