paleolimbot commented on issue #36161:
URL: https://github.com/apache/arrow/issues/36161#issuecomment-1598897541

   Thanks for running these! `bytes_allocated` is the *current* number of bytes 
of Arrow memory that have been allocated; `max_memory` is the maximum number of 
bytes that were allocated at any given time.
   
   The output you've pasted makes sense to me:
   
   - Writing a dataset will convert to a `Table` first, so may require some 
allocating from Arrow's side. After the dataset is written, the Table is 
deleted and the original data frame remains (but no Arrow-allocated memory 
exists).
   - `open_dataset()` does not do any allocating on Arrow's side, it just 
specifies a dataset that could be scanned without scanning it
   - Collecting a dataset into a data.frame does not by itself remove Arrow 
references: many of the vectors that make up the resulting data.frame are 
shells around Arrow memory. It is not until you remove the data.frame and 
garbage collect that the memory is freed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to