arouel opened a new pull request, #3485: URL: https://github.com/apache/parquet-java/pull/3485
### Rationale for this change `ColumnChunkPageWriter.writePage()` and `writePageV2()` call `BytesInput.toByteArray()` to feed compressed page data into `CRC32.update(byte[])`. When the writer uses a direct `ByteBufferAllocator`, this forces a full heap copy of every compressed page solely for checksumming. Since page write checksums are enabled by default (`DEFAULT_PAGE_WRITE_CHECKSUM_ENABLED = true`), this allocation occurs on every page write and negates part of the benefit of using a direct allocator. ### What changes are included in this PR? Replace `crc.update(x.toByteArray())` with `crc.update(x.toByteBuffer(releaser))` - `writePage()` (V1): 1 call for `compressedBytes` - `writePageV2()`: 3 calls for `repetitionLevels`, `definitionLevels`, and `compressedData` `CRC32.update(ByteBuffer)` has been available since Java 9 and operates directly on the buffer's memory without copying. The `releaser` field already exists on `ColumnChunkPageWriter` and provides the allocator-aware `ByteBuffer` lifecycle management needed. When the allocator is heap-based, `toByteBuffer(releaser)` returns a heap buffer and behavior is functionally equivalent to the previous code. ### Are these changes tested? The existing `TestColumnChunkPageWriteStore` covers both heap and direct allocator paths (`test` and `testWithDirectBuffers`) and exercises `writePageV2` with checksums enabled by default. `TestDataPageChecksums` covers V1 and V2 pages with checksums on/off. ### Are there any user-facing changes? No. This is an internal optimization. CRC32 checksums are computed identically; only the intermediate memory representation changes. Closes #3484 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
