kopczynski-9livesdata opened a new issue, #45882:
URL: https://github.com/apache/arrow/issues/45882
### Describe the bug, including details regarding any error messages,
version, and platform.
When running a script like the following
```
import pyarrow.dataset as ds
import pyarrow.fs
from memory_profiler import profile
import pyarrow
@profile
def load_parquet():
print(f"{pyarrow.total_allocated_bytes() / (1024*1024)}")
fs = pyarrow.fs.S3FileSystem()
s3_dataset = ds.dataset("[my_s3_bucket]/10g.parquet", filesystem=fs)
scanner = s3_dataset.scanner()
scanner.head(10)
del scanner
del s3_dataset
del fs
pyarrow.default_memory_pool().release_unused()
print(f"{pyarrow.total_allocated_bytes() / (1024*1024)}")
if __name__ == "__main__":
load_parquet()
```
I keep getting significant amount of memory allocated which I cannot force
`pyarrow` to release back to OS.
Memory profile and output from this script looks like this:
```
0.0
1516.7496948242188
Filename: src/mem.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
6 125.2 MiB 125.2 MiB 1 @profile
7 def load_parquet():
8 125.2 MiB 0.0 MiB 1
print(f"{pyarrow.total_allocated_bytes() / (1024*1024)}")
9 130.2 MiB 5.0 MiB 1 fs =
pyarrow.fs.S3FileSystem()
10 140.7 MiB 10.5 MiB 1 s3_dataset =
ds.dataset("dotdata-ddent-tkopczynski-dev/10g.parquet", filesystem=fs)
11 140.9 MiB 0.1 MiB 1 scanner =
s3_dataset.scanner()
12 1700.1 MiB 1559.2 MiB 1 scanner.head(10)
13 1701.0 MiB 0.9 MiB 1 del scanner
14 1701.5 MiB 0.5 MiB 1 del s3_dataset
15 1701.6 MiB 0.1 MiB 1 del fs
16 1701.3 MiB -0.3 MiB 1
pyarrow.default_memory_pool().release_unused()
17 1701.3 MiB 0.0 MiB 1
print(f"{pyarrow.total_allocated_bytes() / (1024*1024)}")
```
Is there anything I'm doing wrong or is this a memory leak?
OS: Ubuntu 22.04.5 LTS
pyarrow version: 19.0.1
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]