Re: [I] Memory Leak with Pandas read_parquet using pyarrow engine [arrow]

via GitHub Tue, 18 Mar 2025 19:31:28 -0700


erykoff commented on issue #45504:
URL: https://github.com/apache/arrow/issues/45504#issuecomment-2735170074


   I have encountered what I think is the same major memory leak when reading 
large parquet files with `pyarrow.parquet.read_table()`, no pandas even 
installed.  `pyarrow.total_allocated_bytes()` reports constant numbers, but the 
actual memory usage just goes up and up and up.
   
   Whatever the problem is appeared between `pyarrow=17` (which uses the 
expected amount of memory and doesn't leak) and `pyarrow=18` (which leaks like 
a sieve).
   
   This is a major regression.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] Memory Leak with Pandas read_parquet using pyarrow engine [arrow]

Reply via email to