paleolimbot commented on issue #36161: URL: https://github.com/apache/arrow/issues/36161#issuecomment-1598897541
Thanks for running these! `bytes_allocated` is the *current* number of bytes of Arrow memory that have been allocated; `max_memory` is the maximum number of bytes that were allocated at any given time. The output you've pasted makes sense to me: - Writing a dataset will convert to a `Table` first, so may require some allocating from Arrow's side. After the dataset is written, the Table is deleted and the original data frame remains (but no Arrow-allocated memory exists). - `open_dataset()` does not do any allocating on Arrow's side, it just specifies a dataset that could be scanned without scanning it - Collecting a dataset into a data.frame does not by itself remove Arrow references: many of the vectors that make up the resulting data.frame are shells around Arrow memory. It is not until you remove the data.frame and garbage collect that the memory is freed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org