wiedld opened a new issue, #11344:
URL: https://github.com/apache/datafusion/issues/11344

   ### Is your feature request related to a problem or challenge?
   
   The encoding of parquet requires a [non-trivial amount of memory 
buffering](https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/struct.ArrowWriter.html#memory-limiting).
 During the execution of datafusion physical plans, parquet may be encoded 
using 
[ParquetSink](https://github.com/apache/datafusion/blob/782df390078b1aee157d999898424c23530c3eca/datafusion/core/src/datasource/file_format/parquet.rs#L591)
 (e.g. `COPY TO` queries which output parquet). Currently we do not track 
ParquetSink's memory usage in the task context's memory pool.
   
   ### Describe the solution you'd like
   
   Start tracking the memory used during parquet encoding.
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   Recently, we [extended several 
parquet](https://github.com/apache/arrow-rs/pull/5967) interfaces to provide 
better estimates of the `memory_usage` during encoding. These memory usage 
estimates should be used to determine the appropriate memory reservations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to