Hi Experts,

Pyarrow *Table.from_pylist* does not release memory until the program
terminates. I created a sample script to highlight the issue. I have also
tried setting up `pa.jemalloc_set_decay_ms(0)` but it didn't help much.
Could you please check this and let me know if there are potential issues /
any workaround to resolve this?

>>> pyarrow.__version__
'12.0.0'

OS Details:
OS: macOS 13.4 (22F66)
Kernel Version: Darwin 22.5.0



Sample code to reproduce. (it needs memory_profiler)

#file_name: test_exec.py
import pyarrow as pa
import time
import random
import string

from memory_profiler import profile

def get_sample_data():
    record1 = {}
    for col_id in range(15):
        record1[f"column_{col_id}"] = string.ascii_letters[10 :
random.randint(17, 49)]

    return [record1]

def construct_data(data):
    count = 1
    while count < 10:
        pa.Table.from_pylist(data * 100000)
        count += 1
    return True

@profile
def main():
    data = get_sample_data()
    construct_data(data)
    print("construct data completed!")

if __name__ == "__main__":
    main()
    time.sleep(600)


memory_profiler output:

Filename: test_exec.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    41     65.6 MiB     65.6 MiB           1   @profile
    42                                         def main():
    43     65.6 MiB      0.0 MiB           1       data = get_sample_data()
    44    203.8 MiB    138.2 MiB           1       construct_data(data)
    45    203.8 MiB      0.0 MiB           1       print("construct data
completed!")

Regards,
Alex

Reply via email to