Will Jones created ARROW-17441: ---------------------------------- Summary: [Python] Memory kept after del and pool.released_unused() Key: ARROW-17441 URL: https://issues.apache.org/jira/browse/ARROW-17441 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 9.0.0 Reporter: Will Jones
I was trying reproduce another issue involving memory pools not releasing memory, but encountered this confusing behavior: if I create a table, then call {{{}del table{}}}, and then {{{}pool.release_unused(){}}}, I still see significant memory usage. On mimalloc in particular, I see no meaningful drop in memory usage on either call. Am I missing something? {code:python} import os import psutil import time import gc process = psutil.Process(os.getpid()) import numpy as np from uuid import uuid4 import pyarrow as pa def gen_batches(n_groups=200, rows_per_group=200_000): for _ in range(n_groups): id_val = uuid4().bytes yield pa.table({ "x": np.random.random(rows_per_group), # This will compress poorly "y": np.random.random(rows_per_group), "a": pa.array(list(range(rows_per_group)), type=pa.int32()), # This compresses with delta encoding "id": pa.array([id_val] * rows_per_group), # This compresses with RLE }) def print_rss(): print(f"RSS: {process.memory_info().rss:,} bytes") print(f"memory_pool={pa.default_memory_pool().backend_name}") print_rss() print("reading table") tab = pa.concat_tables(list(gen_batches())) print_rss() print("deleting table") del tab gc.collect() print_rss() print("releasing unused memory") pa.default_memory_pool().release_unused() print_rss() print("waiting 10 seconds") time.sleep(10) print_rss() {code} {code:none} > ARROW_DEFAULT_MEMORY_POOL=mimalloc python test_pool.py && \ ARROW_DEFAULT_MEMORY_POOL=jemalloc python test_pool.py && \ ARROW_DEFAULT_MEMORY_POOL=system python test_pool.py memory_pool=mimalloc RSS: 44,449,792 bytes reading table RSS: 1,819,557,888 bytes deleting table RSS: 1,819,590,656 bytes releasing unused memory RSS: 1,819,852,800 bytes waiting 10 seconds RSS: 1,819,852,800 bytes memory_pool=jemalloc RSS: 45,629,440 bytes reading table RSS: 1,668,677,632 bytes deleting table RSS: 698,400,768 bytes releasing unused memory RSS: 699,023,360 bytes waiting 10 seconds RSS: 699,023,360 bytes memory_pool=system RSS: 44,875,776 bytes reading table RSS: 1,713,569,792 bytes deleting table RSS: 540,311,552 bytes releasing unused memory RSS: 540,311,552 bytes waiting 10 seconds RSS: 540,311,552 bytes {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)