[GitHub] [arrow] westonpace commented on issue #9295: Memory Usage increases while Reading the IPC format buffers.

GitBox Fri, 22 Jan 2021 11:26:47 -0800


westonpace commented on issue #9295:
URL: https://github.com/apache/arrow/issues/9295#issuecomment-765634730

Thanks for asking. There are a number of things to consider when looking at
memory allocations by Arrow. Also, which language are you working with?

Out of the box Arrow will usually use a 3rd party allocator (jemallor or
mimalloc). These allocators can sometimes have unexpected behavior. For
example, they may not relinquish RAM to the OS immediately. They might hold on
to RAM for a while in case they can fulfill an upcoming request with it. These
things make it difficult to tell if RAM usage is accurate or not but there are
some things to look for.

Your application should eventually approach a steady state. If it is
running for a long time, it should reach some steady state and stop increasing
RAM usage. If it does not it may be evidence of a leak.

Your application should be able to utilize most of the available RAM.

There is a total allocated bytes counter which you can access from the
memory pool (how you do this will depend on the language. For example, in
Python use
[this](https://arrow.apache.org/docs/python/generated/pyarrow.total_allocated_bytes.html)).
This counter shows how many bytes are currently in use (which will probably
be less than the # of bytes the allocator has "reserved" from the OS). This
will not show any overhead. So if you make a call, and then release the RAM
used by the call, the total allocated bytes should return to where it was
previously. This counter can be used to check for leaks.

So at the moment, a "big spike" is a little vague and it is difficult to
tell if it is a problem or not. How much data are you loading? Can you
provide a sample file or a sample script? How quickly does it grow and what
does it grow to? Does it get relinquished or reused if your program runs for a
long time? Is the total_allocated_bytes counter also spiking?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] westonpace commented on issue #9295: Memory Usage increases while Reading the IPC format buffers.

Reply via email to