iemejia opened a new issue, #3505:
URL: https://github.com/apache/parquet-java/issues/3505

   ### Describe the enhancement requested
   
   `ByteStreamSplitValuesReader` is the symmetric reader for 
`BYTE_STREAM_SPLIT`-encoded `FLOAT`, `DOUBLE`, `INT32`, and `INT64` columns. On 
`initFromPage` it eagerly transposes the entire page from stream-split layout 
(`elementSizeInBytes` separate streams of `valuesCount` bytes each) back to 
interleaved layout (`valuesCount` elements of `elementSizeInBytes` bytes each). 
The current loop is:
   
   ```java
   private byte[] decodeData(ByteBuffer encoded, int valuesCount) {
     byte[] decoded = new byte[encoded.limit()];
     int destByteIndex = 0;
     for (int srcValueIndex = 0; srcValueIndex < valuesCount; ++srcValueIndex) {
       for (int stream = 0; stream < elementSizeInBytes; ++stream, 
++destByteIndex) {
         decoded[destByteIndex] = encoded.get(srcValueIndex + stream * 
valuesCount);
       }
     }
     return decoded;
   }
   ```
   
   Two issues on the hot path:
   
   1. Every read goes through `ByteBuffer.get(int)`, which does per-call bounds 
checks and dispatches through `HeapByteBuffer`/`DirectByteBuffer` virtual 
methods.
   2. The inner stream offset (`stream * valuesCount`) is recomputed on every 
iteration even though it depends only on the outer loop.
   
   For a 100k-value `FLOAT` page that is 400k `ByteBuffer.get(int)` calls; for 
a `DOUBLE`/`LONG` page it is 800k.
   
   JMH (new `ByteStreamSplitDecodingBenchmark`, 100k values per invocation, JDK 
18, `-wi 5 -i 10 -f 3`, 30 samples) on master:
   
   | Type   | ops/s   |
   |--------|--------:|
   | Float  | 47.80M  |
   | Double | 26.32M  |
   | Int    | 47.07M  |
   | Long   | 26.80M  |
   
   ### Proposal
   
   Restructure `decodeData` in `ByteStreamSplitValuesReader`:
   
   1. **Drop down to a `byte[]` view** of the encoded buffer. When 
`encoded.hasArray()` is true (the typical case), use the backing array directly 
with the correct base offset; otherwise copy once with a single `get(byte[])` 
call. This eliminates the per-byte `ByteBuffer.get(int)` bounds check and 
virtual dispatch.
   
   2. **Specialize loops for the common element sizes (4 and 8)**. Hoist all 
`stream * valuesCount` offsets out of the inner loop into local ints (`s0..s3` 
for floats/ints, `s0..s7` for doubles/longs), and write each output slot 
exactly once in a single sequential pass. The reads come from 
`elementSizeInBytes` concurrent sequential streams, which modern hardware 
prefetchers handle well (typically 8–16 tracked streams per core).
   
   3. **Generic fallback** for arbitrary element sizes (`FIXED_LEN_BYTE_ARRAY` 
of any width).
   
   Expected speedup (same JMH config):
   
   | Type   | Before  | After    | Δ              |
   |--------|--------:|---------:|---------------:|
   | Float  | 47.80M  | 162.29M  | **+240% (3.4x)** |
   | Double | 26.32M  | 66.00M   | **+151% (2.5x)** |
   | Int    | 47.07M  | 162.18M  | **+245% (3.5x)** |
   | Long   | 26.80M  | 66.00M   | **+146% (2.5x)** |
   
   ### Scope
   
   - Single file change to 
`parquet-column/src/main/java/org/apache/parquet/column/values/bytestreamsplit/ByteStreamSplitValuesReader.java`.
   - No public-API change; only the `private decodeData` helper is rewritten.
   - All 573 `parquet-column` tests pass; 51 BSS-specific tests pass.
   
   ### Relation
   
   Symmetric companion to #3504 (writer-side BSS optimization). Part of a small 
series of focused performance PRs from work in 
[parquet-perf](https://github.com/iemejia/parquet-perf). Previous: #3494, 
#3496, #3500, #3504.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to