richox opened a new issue, #4477: URL: https://github.com/apache/arrow-rs/issues/4477
**Describe the bug** <!-- A clear and concise description of what the bug is. --> when writing big parquet files using `AsyncArrowWriter`, we found that the memory usage is unexpectedly high, and sometimes makes the process run out of memory. the bug is likely in the following code. it tried to trigger flushing once the buffer size reaches half of the capacity. however, when data is written into buffer, the capacity also increases along with size. so this condition is not working expectedly. https://github.com/apache/arrow-rs/blob/aac3aa99398c4f4fe59c60d1839d3a8ab60d00f3/parquet/src/arrow/async_writer/mod.rs#L145 **To Reproduce** <!-- Steps to reproduce the behavior: --> read a big parquet file, then write to another file with `AsyncArrowWriter`. since reading is ususally faster than writing. data will be buffered but not correctly flushed, causing OOM. **Expected behavior** <!-- A clear and concise description of what you expected to happen. --> trigger flushing with the constant initial buffer capacity. **Additional context** <!-- Add any other context about the problem here. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
