arouel opened a new pull request, #3488: URL: https://github.com/apache/parquet-java/pull/3488
### Rationale for this change `ParquetFileReader` never closes the `ColumnChunkPageReadStore` it returns from `readNextRowGroup()`. When a subsequent call replaces `currentRowGroup`, the previous instance's `ByteBufferReleaser` is abandoned without releasing the compressed I/O buffers and any off-heap decompressed page buffers it holds. With the default `HeapByteBufferAllocator` this is masked by GC, but with a direct `ByteBufferAllocator` it becomes a hard native memory leak that grows with every row group read. `InternalParquetRecordReader` works around this by manually closing the `PageReadStore` before each read and in its own `close()`, but any direct caller of `ParquetFileReader` that does not replicate this pattern will leak buffers. ### What changes are included in this PR? A private `closeCurrentRowGroup()` method is added to `ParquetFileReader` that null-safely closes and nulls the `currentRowGroup` field. It is called in `readNextRowGroup()` and `readNextFilteredRowGroup()` before assigning the new row group, and `currentRowGroup` is included in the `AutoCloseables.uncheckedClose()` chain in `close()`. This brings the buffer lifecycle management into `ParquetFileReader` itself so all callers benefit automatically. ### Are these changes tested? The existing test suites in parquet-hadoop continue to pass. Additional tests got added to verify that `PageReadStore` buffers are properly released. ### Are there any user-facing changes? No API changes. Callers that already close the `PageReadStore` themselves (like `InternalParquetRecordReader`) will see a harmless double-close since `ColumnChunkPageReadStore.close()` is idempotent via `ByteBufferReleaser`. Callers that did not close the `PageReadStore` will now have their buffers released automatically, reducing memory usage. Closes #3487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
