iemejia opened a new issue, #3493:
URL: https://github.com/apache/parquet-java/issues/3493

   ### Describe the enhancement requested
   
   `PlainValuesReader` (used for PLAIN-encoded INT32, INT64, FLOAT, and DOUBLE 
columns) currently
   reads each value through a `LittleEndianDataInputStream` wrapper around a 
`ByteBufferInputStream`.
   Per value, `readInt()` performs 4 separate virtual `in.read()` calls and 
manually reassembles the
   result with bit shifts:
   
   ```java
   public final int readInt() throws IOException {
       int ch4 = in.read();
       int ch3 = in.read();
       int ch2 = in.read();
       int ch1 = in.read();
       if ((ch1 | ch2 | ch3 | ch4) < 0) throw new EOFException();
       return ((ch1 << 24) + (ch2 << 16) + (ch3 << 8) + (ch4 << 0));
   }
   ```
   
   The `LittleEndianDataInputStream.readInt()` method itself carries a TODO 
comment from years ago
   suggesting exactly this kind of replacement (a `ByteBuffer`-backed 
implementation).
   
   ### Proposal
   
   In `PlainValuesReader.initFromPage()`, obtain the page data as a single 
contiguous `ByteBuffer`
   via `stream.slice(stream.available())` with `ByteOrder.LITTLE_ENDIAN`, and 
call the corresponding
   `ByteBuffer` accessor directly per value:
   
   - `readInteger()` → `buffer.getInt()`
   - `readLong()`    → `buffer.getLong()`
   - `readFloat()`   → `buffer.getFloat()`
   - `readDouble()`  → `buffer.getDouble()`
   - `skip(n)`       → `buffer.position(buffer.position() + n * typeSize)`
   
   `ByteBuffer.getInt()` with the appropriate byte order is a HotSpot intrinsic 
that compiles to a
   single unaligned load instruction on x86/ARM — no virtual dispatch, no 
per-byte assembly, no
   checked `IOException` on the per-value path.
   
   `ByteBufferInputStream.slice()` already handles both single-buffer 
(zero-copy view) and
   multi-buffer (single contiguous copy) cases transparently. In practice page 
data is almost
   always a single contiguous buffer.
   
   ### Benchmark results
   
   `IntEncodingBenchmark.decodePlain` (100,000 INT32 values per invocation, JMH 
`-wi 3 -i 5 -f 1`):
   
   | Pattern          | Before (ops/s) | After (ops/s)   | Speedup |
   |------------------|---------------:|----------------:|--------:|
   | SEQUENTIAL       |     92,918,297 |   1,143,149,235 | **12.3x** |
   | RANDOM           |     92,126,888 |   1,147,547,093 | **12.5x** |
   | LOW_CARDINALITY  |     93,005,451 |   1,142,666,760 | **12.3x** |
   | HIGH_CARDINALITY |     93,312,596 |   1,144,681,876 | **12.3x** |
   
   The speedup is consistent across data patterns because the bottleneck is 
entirely in the
   per-value dispatch overhead, not the data itself. All four numeric plain 
reader types
   (int, long, float, double) benefit equally.
   
   End-to-end on `ReadBenchmarks.readFile` (6-column schema, 100k rows), this 
contributes to the
   overall read-path improvement of 5–10% measured when combined with related 
decode optimizations.
   
   ### Side effects
   
   After this change, `LittleEndianDataInputStream` has no remaining production 
usages in the
   codebase — only one test reference (`TestColumnChunkPageWriteStore`) and two 
TODO comments.
   A follow-up could remove or deprecate the class.
   
   ### Validation
   
   All 573 `parquet-column` tests pass with the change applied.
   
   ### Component(s)
   
   Core


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to