guozhans opened a new issue, #40738: URL: https://github.com/apache/arrow/issues/40738
### Describe the bug, including details regarding any error messages, version, and platform. Hi, I tried to save Pandas dataframe to parquet files, and encountered a memory leak issue. Even i have installed nightly build pyarrow 16.0.0.dev356 from the server, as the comment mentioned this issue is fixed from https://github.com/apache/arrow/issues/37989 Any idea? here is the memory usage by using memory profiler. Line # Mem usage Increment Occurrences Line Contents ============================================================= 33 425.8 MiB 425.8 MiB 1 @profile 34 def to_parquet(self, df: pd.DataFrame, filename: str): 35 537.6 MiB 111.9 MiB 1 table = Table.from_pandas(df) 36 559.1 MiB 21.4 MiB 1 parquet.write_table(table, filename, compression="snappy") 37 559.1 MiB 0.0 MiB 1 del table 38 #df.to_parquet(filename, compression="snappy") My method ` from pyarrow import parquet from pyarrow import Table @profile def to_parquet(self, df: pd.DataFrame, filename: str): table = Table.from_pandas(df) parquet.write_table(table, filename, compression="snappy") del table #df.to_parquet(filename, compression="snappy") ` My related installed packages: numpy 1.22.4 pandas 2.1.4 pyarrow 16.0.0.dev356 pyarrow-hotfix 0.6 --> from dask dask 2024.2.1 ### Component(s) Parquet, Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
