[ https://issues.apache.org/jira/browse/ARROW-16037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jacob Wujciak-Jens updated ARROW-16037: --------------------------------------- Component/s: Python > Possible memory leak in compute.take > ------------------------------------ > > Key: ARROW-16037 > URL: https://issues.apache.org/jira/browse/ARROW-16037 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 6.0.1 > Environment: Ubuntu > Reporter: Ziheng Wang > Priority: Blocker > > If you run the following code, the memory usage of the process goes up to 1GB > even though the pyarrow allocated bytes is always at ~80MB. The process > memory comes down after a while to 800 MB, but is still way more than what is > necessary. > ''' > import pyarrow as pa > import numpy as np > import pandas as pd > import os, psutil > import pyarrow.compute as compute > import gc > my_table = > pa.Table.from_pandas(pd.DataFrame(np.random.normal(size=(10000,1000)))) > process = psutil.Process(os.getpid()) > print("mem usage", process.memory_info().rss, pa.total_allocated_bytes()) > for i in range(100): > print("mem usage", process.memory_info().rss, pa.total_allocated_bytes()) > temp = compute.sort_indices(my_table['0'], sort_keys = > [('0','ascending')]) > my_table = my_table.take(temp) > gc.collect() > ''' -- This message was sent by Atlassian Jira (v8.20.1#820001)