mthiboust commented on issue #45882: URL: https://github.com/apache/arrow/issues/45882#issuecomment-2788737009
The difference between versions in the memory profile above is explained by the choice of memory allocator: `jemalloc` by default for v17 vs `mimalloc` for v18 and v19. Below is a new memory profile with `pyarrow==19.0.1` for `mimalloc` on the left and `jemalloc` on the right (loop of 5 iteration of parquet file reading).  You can set `jemalloc` as the default memory allocator using: `export ARROW_DEFAULT_MEMORY_POOL=jemalloc` See: > The default memory pool has changed to mimalloc on all platforms (GH-43254). Previously, jemalloc was used by default on Linux. Using mimalloc by default provides a more consistent experience across different platforms, and makes configuration easier. It is expected that this might either increase or decrease performance on user workloads that use the default memory pool; please benchmark accordingly. Jemalloc can still be selected by setting the [ARROW_DEFAULT_MEMORY_POOL](https://arrow.apache.org/docs/cpp/env_vars.html#envvar-ARROW_DEFAULT_MEMORY_POOL) environment variable to “jemalloc”. > https://arrow.apache.org/blog/2024/10/28/18.0.0-release/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
