iemejia opened a new pull request, #3536: URL: https://github.com/apache/parquet-java/pull/3536
## Summary - Eliminate full-page `toByteArray()` copy in `BAOSBytesInput.writeInto(ByteBuffer)` by streaming through a `ByteBufferBackedOutputStream` adapter - Eliminate full-page `toByteArray()` copy in CRC32 checksum computation by streaming through a `CRC32OutputStream` adapter ## Details Two sources of unnecessary full-page-size `byte[]` allocations: 1. `BAOSBytesInput.writeInto(ByteBuffer)` called `toByteArray()` which copies the entire internal buffer. Replace with `writeTo()` using a thin `ByteBufferBackedOutputStream` adapter that writes directly from the internal `buf[]` without allocation. 2. CRC32 checksum computation in `ColumnChunkPageWriteStore` called `toByteArray()` on each `BytesInput` to pass to `crc.update(byte[])`. Replace with `writeAllTo(CRC32OutputStream)` that streams bytes directly to `crc.update()` without intermediate copies. For a typical 1MB page, this eliminates 1-3 full-page allocations per page during write (one per CRC computation + one for the ByteBuffer assembly). The benefit is reduced GC pressure and peak memory rather than steady-state throughput. All TestBytesInput tests pass. Compiles clean. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
