[jira] [Commented] (ARROW-7956) [Python] Memory leak in pyarrow functions .ipc.serialize_pandas/deserialize_pandas
[ https://issues.apache.org/jira/browse/ARROW-7956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055783#comment-17055783 ] Joris Van den Bossche commented on ARROW-7956: -- [~wesm] I think this was closed by https://github.com/apache/arrow/pull/6551 ? > [Python] Memory leak in pyarrow functions > .ipc.serialize_pandas/deserialize_pandas > -- > > Key: ARROW-7956 > URL: https://issues.apache.org/jira/browse/ARROW-7956 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.15.0 >Reporter: Denis >Assignee: Wes McKinney >Priority: Critical > Fix For: 0.17.0 > > Attachments: loans.parquet, pyarrow_mem_leak_test.py > > > Used python version is 3.7.4 (conda distribution) > OS: Ubunty 18.04 > pandas version is 0.24.2 > numpy version is 1.16.4 > > To reproduce the issue run the attached script pyarrow_mem_leak_test.py. Also > put the attached file loans.parquet to the folder of working directory. > > Also the reading and writing to parquet in memory do has memory leaks. To > reproduce this run function test_parquet_leak() from the attached file > pyarrow_mem_leak_test.py > The memory leak is 100% reproducible. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7956) [Python] Memory leak in pyarrow functions .ipc.serialize_pandas/deserialize_pandas
[ https://issues.apache.org/jira/browse/ARROW-7956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17050307#comment-17050307 ] Joris Van den Bossche commented on ARROW-7956: -- It seems the object dtype is the trigger. I can reproduce this on 0.15 with the following simplified snippet (without involving a parquet file): {code:python} import pyarrow as pa import pandas as pd def test_pyarrow_leak(): df = pd.DataFrame({'a': np.arange(1), 'b': [pd.util.testing.rands(5) for _ in range(1)]}) for i in range(4000): print(f'Iteration {i}') df_bytes = pa.ipc.serialize_pandas(df).to_pybytes() buf = pa.py_buffer(df_bytes) df = pa.ipc.deserialize_pandas(buf) print('End of script') {code} > [Python] Memory leak in pyarrow functions > .ipc.serialize_pandas/deserialize_pandas > -- > > Key: ARROW-7956 > URL: https://issues.apache.org/jira/browse/ARROW-7956 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.15.0 >Reporter: Denis >Priority: Critical > Fix For: 1.0.0 > > Attachments: loans.parquet, pyarrow_mem_leak_test.py > > > Used python version is 3.7.4 (conda distribution) > OS: Ubunty 18.04 > pandas version is 0.24.2 > numpy version is 1.16.4 > > To reproduce the issue run the attached script pyarrow_mem_leak_test.py. Also > put the attached file loans.parquet to the folder of working directory. > > Also the reading and writing to parquet in memory do has memory leaks. To > reproduce this run function test_parquet_leak() from the attached file > pyarrow_mem_leak_test.py > The memory leak is 100% reproducible. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7956) [Python] Memory leak in pyarrow functions .ipc.serialize_pandas/deserialize_pandas
[ https://issues.apache.org/jira/browse/ARROW-7956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17046864#comment-17046864 ] Wes McKinney commented on ARROW-7956: - I reopened this as I want to make sure there is an appropriate unit test (or equivalent) for this > [Python] Memory leak in pyarrow functions > .ipc.serialize_pandas/deserialize_pandas > -- > > Key: ARROW-7956 > URL: https://issues.apache.org/jira/browse/ARROW-7956 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.15.0 >Reporter: Denis >Priority: Critical > Fix For: 1.0.0 > > Attachments: loans.parquet, pyarrow_mem_leak_test.py > > > Used python version is 3.7.4 (conda distribution) > OS: Ubunty 18.04 > pandas version is 0.24.2 > numpy version is 1.16.4 > > To reproduce the issue run the attached script pyarrow_mem_leak_test.py. Also > put the attached file loans.parquet to the folder of working directory. > > Also the reading and writing to parquet in memory do has memory leaks. To > reproduce this run function test_parquet_leak() from the attached file > pyarrow_mem_leak_test.py > The memory leak is 100% reproducible. -- This message was sent by Atlassian Jira (v8.3.4#803005)