iemejia opened a new pull request, #3537:
URL: https://github.com/apache/parquet-java/pull/3537

   ## Summary
   
   - Add `writeAllToAndRelease()` to `ConcatenatingByteBufferCollector` for 
progressive slab-by-slab memory release during write
   - Make `close()` idempotent (safe to call after eager release or multiple 
times)
   - Call `pageWriter.close()` after each column in `flushToFileWriter()` to 
release buffers immediately
   
   ## Rationale
   
   During `flushToFileWriter()`, each column's compressed page buffers were 
held in memory until the entire row group flush completed. For a schema with N 
columns, peak flush memory was ~N columns' worth of compressed pages.
   
   With this change, each column's buffers are released immediately after being 
written to disk, reducing peak flush memory from ~N columns' worth to ~1 
column's worth. For wide schemas (20+ columns) with large row groups, this can 
reduce peak memory by an order of magnitude during the flush phase.
   
   ## Tests
   
   - `TestConcatenatingByteBufferCollector`: tests for eager release, 
double-close safety, and output equivalence
   - All existing tests pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to