Hi Experts,
Pyarrow *Table.from_pylist* does not release memory until the program
terminates. I created a sample script to highlight the issue. I have also
tried setting up `pa.jemalloc_set_decay_ms(0)` but it didn't help much.
Could you please check this and let me know if there are potential issues /
any workaround to resolve this?
>>> pyarrow.__version__
'12.0.0'
OS Details:
OS: macOS 13.4 (22F66)
Kernel Version: Darwin 22.5.0
Sample code to reproduce. (it needs memory_profiler)
#file_name: test_exec.py
import pyarrow as pa
import time
import random
import string
from memory_profiler import profile
def get_sample_data():
record1 = {}
for col_id in range(15):
record1[f"column_{col_id}"] = string.ascii_letters[10 :
random.randint(17, 49)]
return [record1]
def construct_data(data):
count = 1
while count < 10:
pa.Table.from_pylist(data * 100000)
count += 1
return True
@profile
def main():
data = get_sample_data()
construct_data(data)
print("construct data completed!")
if __name__ == "__main__":
main()
time.sleep(600)
memory_profiler output:
Filename: test_exec.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
41 65.6 MiB 65.6 MiB 1 @profile
42 def main():
43 65.6 MiB 0.0 MiB 1 data = get_sample_data()
44 203.8 MiB 138.2 MiB 1 construct_data(data)
45 203.8 MiB 0.0 MiB 1 print("construct data
completed!")
Regards,
Alex