iemejia opened a new pull request, #3536:
URL: https://github.com/apache/parquet-java/pull/3536

   ## Summary
   
   - Eliminate full-page `toByteArray()` copy in 
`BAOSBytesInput.writeInto(ByteBuffer)` by streaming through a 
`ByteBufferBackedOutputStream` adapter
   - Eliminate full-page `toByteArray()` copy in CRC32 checksum computation by 
streaming through a `CRC32OutputStream` adapter
   
   ## Details
   
   Two sources of unnecessary full-page-size `byte[]` allocations:
   
   1. `BAOSBytesInput.writeInto(ByteBuffer)` called `toByteArray()` which 
copies the entire internal buffer. Replace with `writeTo()` using a thin 
`ByteBufferBackedOutputStream` adapter that writes directly from the internal 
`buf[]` without allocation.
   
   2. CRC32 checksum computation in `ColumnChunkPageWriteStore` called 
`toByteArray()` on each `BytesInput` to pass to `crc.update(byte[])`. Replace 
with `writeAllTo(CRC32OutputStream)` that streams bytes directly to 
`crc.update()` without intermediate copies.
   
   For a typical 1MB page, this eliminates 1-3 full-page allocations per page 
during write (one per CRC computation + one for the ByteBuffer assembly). The 
benefit is reduced GC pressure and peak memory rather than steady-state 
throughput.
   
   All TestBytesInput tests pass. Compiles clean.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to