arouel opened a new issue, #3487:
URL: https://github.com/apache/parquet-java/issues/3487

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   `ParquetFileReader` stores a reference to the last returned 
`ColumnChunkPageReadStore` in `currentRowGroup`, but never closes it:
   1. `readNextRowGroup()` (line 1153) overwrites `this.currentRowGroup = 
rowGroup` without closing the previous instance.
   2. `readNextFilteredRowGroup()` (line 1409) does the same.
   3. `close()` (line 1816-1827) does not close `currentRowGroup` at all, it 
only closes the input stream, dictionary reader, and codec factory.
   
   `ColumnChunkPageReadStore.close()` releases the `ByteBufferReleaser` that 
holds both the compressed file I/O buffers (from 
`ConsecutivePartList.readAll()`) and any off-heap decompressed page buffers 
(from the `useOffHeapDecryptBuffer` path). Since `close()` is never called, 
these buffers are never released.
   With the default `HeapByteBufferAllocator` this is masked by GC because 
`HeapByteBufferAllocator.release()` is a no-op. With a direct 
`ByteBufferAllocator`, this becomes a hard native memory leak that grows with 
every row group read.
   Note that `InternalParquetRecordReader` works around this by manually 
calling `currentRowGroup.close()` before each `readNextRowGroup()` (line 
134-135) and in its own `close()` (line 164-167). Other direct callers of 
`ParquetFileReader` (e.g., `ParquetRewriter`, `ColumnIndexValidator`) also 
close the `PageReadStore` themselves. However, any caller that does not 
manually close the returned `PageReadStore` will leak buffers.
   
   ## Expected behavior
   
   `ParquetFileReader` should close the previous `currentRowGroup` before 
assigning a new one in `readNextRowGroup()` / `readNextFilteredRowGroup()`, and 
close the final `currentRowGroup` in its own `close()` method. This matches the 
lifecycle that `InternalParquetRecordReader` implements manually.
   
   ## Error messages
   
   No error is thrown. The buffers silently leak. With a direct allocator, the 
native memory grows unboundedly until the process is killed or the allocator is 
explicitly closed.
   
   ## Version
   
   1.17.0 (older versions are also affected)
   
   ### Component(s)
   
   Core


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to