iemejia opened a new pull request, #3537: URL: https://github.com/apache/parquet-java/pull/3537
## Summary - Add `writeAllToAndRelease()` to `ConcatenatingByteBufferCollector` for progressive slab-by-slab memory release during write - Make `close()` idempotent (safe to call after eager release or multiple times) - Call `pageWriter.close()` after each column in `flushToFileWriter()` to release buffers immediately ## Rationale During `flushToFileWriter()`, each column's compressed page buffers were held in memory until the entire row group flush completed. For a schema with N columns, peak flush memory was ~N columns' worth of compressed pages. With this change, each column's buffers are released immediately after being written to disk, reducing peak flush memory from ~N columns' worth to ~1 column's worth. For wide schemas (20+ columns) with large row groups, this can reduce peak memory by an order of magnitude during the flush phase. ## Tests - `TestConcatenatingByteBufferCollector`: tests for eager release, double-close safety, and output equivalence - All existing tests pass -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
