westonpace commented on issue #9295:
URL: https://github.com/apache/arrow/issues/9295#issuecomment-765634730


   Thanks for asking.  There are a number of things to consider when looking at 
memory allocations by Arrow.  Also, which language are you working with?
   
   Out of the box Arrow will usually use a 3rd party allocator (jemallor or 
mimalloc).  These allocators can sometimes have unexpected behavior.  For 
example, they may not relinquish RAM to the OS immediately.  They might hold on 
to RAM for a while in case they can fulfill an upcoming request with it.  These 
things make it difficult to tell if RAM usage is accurate or not but there are 
some things to look for.
   
   Your application should eventually approach a steady state.  If it is 
running for a long time, it should reach some steady state and stop increasing 
RAM usage.  If it does not it may be evidence of a leak.
   
   Your application should be able to utilize most of the available RAM.
   
   There is a total allocated bytes counter which you can access from the 
memory pool (how you do this will depend on the language.  For example, in 
Python use 
[this](https://arrow.apache.org/docs/python/generated/pyarrow.total_allocated_bytes.html)).
  This counter shows how many bytes are currently in use (which will probably 
be less than the # of bytes the allocator has "reserved" from the OS).  This 
will not show any overhead.  So if you make a call, and then release the RAM 
used by the call, the total allocated bytes should return to where it was 
previously.  This counter can be used to check for leaks.
   
   So at the moment, a "big spike" is a little vague and it is difficult to 
tell if it is a problem or not.  How much data are you loading?  Can you 
provide a sample file or a sample script?  How quickly does it grow and what 
does it grow to?  Does it get relinquished or reused if your program runs for a 
long time?  Is the total_allocated_bytes counter also spiking?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to