[I] Memory leak still showed on parquet.write_table and Table.from_pandas [arrow]

via GitHub Fri, 22 Mar 2024 02:36:03 -0700


guozhans opened a new issue, #40738:
URL: https://github.com/apache/arrow/issues/40738


   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Hi,
   
   I tried to save Pandas dataframe to parquet files, and encountered a memory 
leak issue. Even i have installed nightly build pyarrow 16.0.0.dev356 from the 
server, as the comment mentioned this issue is fixed from 
https://github.com/apache/arrow/issues/37989
   
   Any idea? 
   
   here is the memory usage by using memory profiler. 
   
   Line #    Mem usage    Increment  Occurrences   Line Contents
   =============================================================
       33    425.8 MiB    425.8 MiB           1           @profile
       34                                                 def to_parquet(self, 
df: pd.DataFrame, filename: str):
       35    537.6 MiB    111.9 MiB           1               table = 
Table.from_pandas(df)
       36    559.1 MiB     21.4 MiB           1               
parquet.write_table(table, filename, compression="snappy")
       37    559.1 MiB      0.0 MiB           1               del table
       38                                                     
#df.to_parquet(filename, compression="snappy")
   
   My method
   `
   from pyarrow import parquet
   from pyarrow import Table
   
   @profile
   def to_parquet(self, df: pd.DataFrame, filename: str):
       table = Table.from_pandas(df)
       parquet.write_table(table, filename, compression="snappy")
       del table
       #df.to_parquet(filename, compression="snappy")
   `
   
   My related installed packages:
   numpy                     1.22.4
   pandas                    2.1.4
   pyarrow                   16.0.0.dev356
   pyarrow-hotfix            0.6  --> from dask
   dask                      2024.2.1
   
   ### Component(s)
   
   Parquet, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Memory leak still showed on parquet.write_table and Table.from_pandas [arrow]

Reply via email to