Note that you can ask pyarrow how much memory it thinks it is using with
the pyarrow.total_allocated_bytes[1] function.  This can be very useful for
tracking memory leaks.

I see that memory-profiler now has support for different backends. Sadly,
it doesn't look like you can register a custom backend.  Might be a fun
project if someone wanted to add a pyarrow backend for it :)

[1]
https://arrow.apache.org/docs/python/generated/pyarrow.total_allocated_bytes.html

On Thu, Jun 15, 2023 at 9:16 AM Antoine Pitrou <anto...@python.org> wrote:

>
> Hi Alex,
>
> I think you're misinterpreting the results. Yes, the RSS memory (as
> reported by memory_profiler) doesn't seem to decrease. No, it doesn't
> mean that Arrow doesn't release memory. It's actually common for memory
> allocators (such as jemalloc, or the system allocator) to keep
> deallocated pages around, because asking the kernel to recycle them is
> expensive.
>
> Unless your system is running low on memory, you shouldn't care about
> this. Trying to return memory to the kernel can actually make
> performance worse if you ask Arrow to allocate memory soon after.
>
> That said, you can try to call MemoryPool.release_unused() if these
> numbers are important to you:
>
> https://arrow.apache.org/docs/python/generated/pyarrow.MemoryPool.html#pyarrow.MemoryPool.release_unused
>
> Regards
>
> Antoine.
>
>
>
> Le 15/06/2023 à 17:39, Jerald Alex a écrit :
> > Hi Experts,
> >
> > I have come across the memory pool configurations using an environment
> > variable *ARROW_DEFAULT_MEMORY_POOL* and I tried to make use of them and
> > test it.
> >
> > I could observe improvements on macOS with the *system* memory pool but
> no
> > change on linux os. I have captured more details on GH issue
> > https://github.com/apache/arrow/issues/36100... If any one can
> highlight or
> > suggest a way to overcome this problem will be helpful. Appreciate your
> > help.!
> >
> > Regards,
> > Alex
> >
> > On Wed, Jun 14, 2023 at 9:35 PM Jerald Alex <vminf...@gmail.com> wrote:
> >
> >> Hi Experts,
> >>
> >> Pyarrow *Table.from_pylist* does not release memory until the program
> >> terminates. I created a sample script to highlight the issue. I have
> also
> >> tried setting up `pa.jemalloc_set_decay_ms(0)` but it didn't help much.
> >> Could you please check this and let me know if there are potential
> issues /
> >> any workaround to resolve this?
> >>
> >>>>> pyarrow.__version__
> >> '12.0.0'
> >>
> >> OS Details:
> >> OS: macOS 13.4 (22F66)
> >> Kernel Version: Darwin 22.5.0
> >>
> >>
> >>
> >> Sample code to reproduce. (it needs memory_profiler)
> >>
> >> #file_name: test_exec.py
> >> import pyarrow as pa
> >> import time
> >> import random
> >> import string
> >>
> >> from memory_profiler import profile
> >>
> >> def get_sample_data():
> >>      record1 = {}
> >>      for col_id in range(15):
> >>          record1[f"column_{col_id}"] = string.ascii_letters[10 :
> >> random.randint(17, 49)]
> >>
> >>      return [record1]
> >>
> >> def construct_data(data):
> >>      count = 1
> >>      while count < 10:
> >>          pa.Table.from_pylist(data * 100000)
> >>          count += 1
> >>      return True
> >>
> >> @profile
> >> def main():
> >>      data = get_sample_data()
> >>      construct_data(data)
> >>      print("construct data completed!")
> >>
> >> if __name__ == "__main__":
> >>      main()
> >>      time.sleep(600)
> >>
> >>
> >> memory_profiler output:
> >>
> >> Filename: test_exec.py
> >>
> >> Line #    Mem usage    Increment  Occurrences   Line Contents
> >> =============================================================
> >>      41     65.6 MiB     65.6 MiB           1   @profile
> >>      42                                         def main():
> >>      43     65.6 MiB      0.0 MiB           1       data =
> get_sample_data()
> >>      44    203.8 MiB    138.2 MiB           1       construct_data(data)
> >>      45    203.8 MiB      0.0 MiB           1       print("construct
> data
> >> completed!")
> >>
> >> Regards,
> >> Alex
> >>
> >
>

Reply via email to