iemejia opened a new pull request, #3560:
URL: https://github.com/apache/parquet-java/pull/3560

   ## Summary
   
   - Replace `LittleEndianDataInputStream` wrapper with direct `ByteBuffer` 
reads using `LITTLE_ENDIAN` byte order in `PlainValuesReader`, eliminating 
per-value virtual dispatch overhead (4 `in.read()` calls + manual bit shifts → 
single `ByteBuffer.get*()` JVM intrinsic).
   - Add batch read methods (`readIntegers`, `readFloats`, `readLongs`, 
`readDoubles`) that use bulk typed-buffer view reads (e.g. 
`buffer.asIntBuffer().get(dest, offset, count)`) to bypass per-value bounds 
checks and position updates.
   - Page data is obtained as a single contiguous `ByteBuffer` via 
`ByteBufferInputStream.slice(available)`, which handles both single-buffer 
(zero-copy view) and multi-buffer (copy into contiguous buffer) cases 
transparently.
   
   ## Benchmark Results
   
   ### Per-value read optimization (100k INT32 values, JMH):
   
   | Pattern          | Before (ops/s) | After (ops/s)  | Speedup |
   |------------------|-----------------|----------------|---------|
   | SEQUENTIAL       | 427,630,411     | 5,397,298,681  | 12.6x   |
   | RANDOM           | 431,052,072     | 5,437,926,758  | 12.6x   |
   | LOW_CARDINALITY  | 423,443,685     | 5,477,810,011  | 12.9x   |
   | HIGH_CARDINALITY | 426,405,891     | 5,485,493,740  | 12.9x   |
   
   ### Batch read methods (PlainDecodingBenchmark, 100K values, pre-allocated 
arrays):
   
   | Type   | Per-value (ops/s) | Batch (ops/s) | Speedup |
   |--------|-------------------|---------------|---------|
   | INT32  | 5,454M            | 28,256M       | +418%   |
   | FLOAT  | 5,407M            | 25,798M       | +377%   |
   | INT64  | 5,408M            | 8,088M        | +50%    |
   | DOUBLE | 7,404M            | 7,965M        | +8%     |
   
   All 573 parquet-column tests pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to