iemejia opened a new issue, #3493:
URL: https://github.com/apache/parquet-java/issues/3493
### Describe the enhancement requested
`PlainValuesReader` (used for PLAIN-encoded INT32, INT64, FLOAT, and DOUBLE
columns) currently
reads each value through a `LittleEndianDataInputStream` wrapper around a
`ByteBufferInputStream`.
Per value, `readInt()` performs 4 separate virtual `in.read()` calls and
manually reassembles the
result with bit shifts:
```java
public final int readInt() throws IOException {
int ch4 = in.read();
int ch3 = in.read();
int ch2 = in.read();
int ch1 = in.read();
if ((ch1 | ch2 | ch3 | ch4) < 0) throw new EOFException();
return ((ch1 << 24) + (ch2 << 16) + (ch3 << 8) + (ch4 << 0));
}
```
The `LittleEndianDataInputStream.readInt()` method itself carries a TODO
comment from years ago
suggesting exactly this kind of replacement (a `ByteBuffer`-backed
implementation).
### Proposal
In `PlainValuesReader.initFromPage()`, obtain the page data as a single
contiguous `ByteBuffer`
via `stream.slice(stream.available())` with `ByteOrder.LITTLE_ENDIAN`, and
call the corresponding
`ByteBuffer` accessor directly per value:
- `readInteger()` → `buffer.getInt()`
- `readLong()` → `buffer.getLong()`
- `readFloat()` → `buffer.getFloat()`
- `readDouble()` → `buffer.getDouble()`
- `skip(n)` → `buffer.position(buffer.position() + n * typeSize)`
`ByteBuffer.getInt()` with the appropriate byte order is a HotSpot intrinsic
that compiles to a
single unaligned load instruction on x86/ARM — no virtual dispatch, no
per-byte assembly, no
checked `IOException` on the per-value path.
`ByteBufferInputStream.slice()` already handles both single-buffer
(zero-copy view) and
multi-buffer (single contiguous copy) cases transparently. In practice page
data is almost
always a single contiguous buffer.
### Benchmark results
`IntEncodingBenchmark.decodePlain` (100,000 INT32 values per invocation, JMH
`-wi 3 -i 5 -f 1`):
| Pattern | Before (ops/s) | After (ops/s) | Speedup |
|------------------|---------------:|----------------:|--------:|
| SEQUENTIAL | 92,918,297 | 1,143,149,235 | **12.3x** |
| RANDOM | 92,126,888 | 1,147,547,093 | **12.5x** |
| LOW_CARDINALITY | 93,005,451 | 1,142,666,760 | **12.3x** |
| HIGH_CARDINALITY | 93,312,596 | 1,144,681,876 | **12.3x** |
The speedup is consistent across data patterns because the bottleneck is
entirely in the
per-value dispatch overhead, not the data itself. All four numeric plain
reader types
(int, long, float, double) benefit equally.
End-to-end on `ReadBenchmarks.readFile` (6-column schema, 100k rows), this
contributes to the
overall read-path improvement of 5–10% measured when combined with related
decode optimizations.
### Side effects
After this change, `LittleEndianDataInputStream` has no remaining production
usages in the
codebase — only one test reference (`TestColumnChunkPageWriteStore`) and two
TODO comments.
A follow-up could remove or deprecate the class.
### Validation
All 573 `parquet-column` tests pass with the change applied.
### Component(s)
Core
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]