Re: pyarrow Table.from_pylist doesn;t release memory

Antoine Pitrou Thu, 15 Jun 2023 09:16:50 -0700


Hi Alex,

I think you're misinterpreting the results. Yes, the RSS memory (asreported by memory_profiler) doesn't seem to decrease. No, it doesn'tmean that Arrow doesn't release memory. It's actually common for memoryallocators (such as jemalloc, or the system allocator) to keepdeallocated pages around, because asking the kernel to recycle them isexpensive.

Unless your system is running low on memory, you shouldn't care aboutthis. Trying to return memory to the kernel can actually makeperformance worse if you ask Arrow to allocate memory soon after.

That said, you can try to call MemoryPool.release_unused() if thesenumbers are important to you:

https://arrow.apache.org/docs/python/generated/pyarrow.MemoryPool.html#pyarrow.MemoryPool.release_unused

Regards

Antoine.



Le 15/06/2023 à 17:39, Jerald Alex a écrit :

Hi Experts,

I have come across the memory pool configurations using an environment
variable *ARROW_DEFAULT_MEMORY_POOL* and I tried to make use of them and
test it.

I could observe improvements on macOS with the *system* memory pool but no
change on linux os. I have captured more details on GH issue
https://github.com/apache/arrow/issues/36100... If any one can highlight or
suggest a way to overcome this problem will be helpful. Appreciate your
help.!

Regards,
Alex

On Wed, Jun 14, 2023 at 9:35 PM Jerald Alex <vminf...@gmail.com> wrote:

Hi Experts,

Pyarrow *Table.from_pylist* does not release memory until the program
terminates. I created a sample script to highlight the issue. I have also
tried setting up `pa.jemalloc_set_decay_ms(0)` but it didn't help much.
Could you please check this and let me know if there are potential issues /
any workaround to resolve this?

pyarrow.__version__

'12.0.0'

OS Details:
OS: macOS 13.4 (22F66)
Kernel Version: Darwin 22.5.0



Sample code to reproduce. (it needs memory_profiler)

#file_name: test_exec.py
import pyarrow as pa
import time
import random
import string

from memory_profiler import profile

def get_sample_data():
     record1 = {}
     for col_id in range(15):
         record1[f"column_{col_id}"] = string.ascii_letters[10 :
random.randint(17, 49)]

     return [record1]

def construct_data(data):
     count = 1
     while count < 10:
         pa.Table.from_pylist(data * 100000)
         count += 1
     return True

@profile
def main():
     data = get_sample_data()
     construct_data(data)
     print("construct data completed!")

if __name__ == "__main__":
     main()
     time.sleep(600)


memory_profiler output:

Filename: test_exec.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     41     65.6 MiB     65.6 MiB           1   @profile
     42                                         def main():
     43     65.6 MiB      0.0 MiB           1       data = get_sample_data()
     44    203.8 MiB    138.2 MiB           1       construct_data(data)
     45    203.8 MiB      0.0 MiB           1       print("construct data
completed!")

Regards,
Alex

Re: pyarrow Table.from_pylist doesn;t release memory

Reply via email to