arouel opened a new issue, #3487: URL: https://github.com/apache/parquet-java/issues/3487
### Describe the bug, including details regarding any error messages, version, and platform. `ParquetFileReader` stores a reference to the last returned `ColumnChunkPageReadStore` in `currentRowGroup`, but never closes it: 1. `readNextRowGroup()` (line 1153) overwrites `this.currentRowGroup = rowGroup` without closing the previous instance. 2. `readNextFilteredRowGroup()` (line 1409) does the same. 3. `close()` (line 1816-1827) does not close `currentRowGroup` at all, it only closes the input stream, dictionary reader, and codec factory. `ColumnChunkPageReadStore.close()` releases the `ByteBufferReleaser` that holds both the compressed file I/O buffers (from `ConsecutivePartList.readAll()`) and any off-heap decompressed page buffers (from the `useOffHeapDecryptBuffer` path). Since `close()` is never called, these buffers are never released. With the default `HeapByteBufferAllocator` this is masked by GC because `HeapByteBufferAllocator.release()` is a no-op. With a direct `ByteBufferAllocator`, this becomes a hard native memory leak that grows with every row group read. Note that `InternalParquetRecordReader` works around this by manually calling `currentRowGroup.close()` before each `readNextRowGroup()` (line 134-135) and in its own `close()` (line 164-167). Other direct callers of `ParquetFileReader` (e.g., `ParquetRewriter`, `ColumnIndexValidator`) also close the `PageReadStore` themselves. However, any caller that does not manually close the returned `PageReadStore` will leak buffers. ## Expected behavior `ParquetFileReader` should close the previous `currentRowGroup` before assigning a new one in `readNextRowGroup()` / `readNextFilteredRowGroup()`, and close the final `currentRowGroup` in its own `close()` method. This matches the lifecycle that `InternalParquetRecordReader` implements manually. ## Error messages No error is thrown. The buffers silently leak. With a direct allocator, the native memory grows unboundedly until the process is killed or the allocator is explicitly closed. ## Version 1.17.0 (older versions are also affected) ### Component(s) Core -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
