[PR] GH-3522: Add batch read APIs to ValuesReader hierarchy [parquet-java]

via GitHub Fri, 01 May 2026 15:21:02 -0700


iemejia opened a new pull request, #3535:
URL: https://github.com/apache/parquet-java/pull/3535


   ## Summary
   
   - Add `readIntegers()`, `readLongs()`, `readFloats()`, `readDoubles()` batch 
methods to `ValuesReader` with default loop-based implementations
   - Override in specialized readers to amortize per-value overhead across 
batches
   
   ## Overrides
   
   - **RunLengthBitPackingHybridDecoder.readInts()**: batch across RLE runs and 
packed groups using `Arrays.fill`/`System.arraycopy`
   - **DictionaryValuesReader**: batch-decode dictionary IDs first, then 
batch-lookup values (eliminates per-value IOException try/catch)
   - **DeltaBinaryPackingValuesReader**: `System.arraycopy` from pre-decoded 
buffer
   - **PlainValuesReader** (all types): loop over LittleEndianDataInputStream
   - **ByteStreamSplitValuesReader** (all types): indexed ByteBuffer bulk read
   
   ## Rationale
   
   These APIs enable callers to amortize per-value overhead (virtual dispatch, 
bounds checks, mode switches) across batches. Combined with other optimizations 
in this series (ByteBuffer-based RLE decoder, etc.), batch reads yield 
significant throughput improvements over per-value loops.
   
   All 576 parquet-column tests pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] GH-3522: Add batch read APIs to ValuesReader hierarchy [parquet-java]

Reply via email to