kubat-square-sense commented on issue #45236:
URL: https://github.com/apache/arrow/issues/45236#issuecomment-2589990638
Maybe you had a different pyarrow build?
I have the same result in a Dockerfile
```Dockerfile
FROM python:3.12
RUN pip install pyarrow==18.1.0 memray
COPY test.parquet test.parquet
RUN echo "import pyarrow.parquet as pq\ndata =
pq.read_table('test.parquet')\nprint(data.nbytes / 1024**2)" > /test.py
RUN memray run --native -o report.bin test.py
CMD memray stats report.bin
```
`docker build --tag test_image .`
`docker run test_image`
Running the above (current pyarrow distro 18.1.0) yields:
📏 Total allocations:
75319
📦 Total memory allocated:
1.476GB
📊 Histogram of allocation size:
min: 1.000B
--------------------------------------------
< 6.000B : 4688 ▇▇▇▇▇
< 36.000B : 22577 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 222.000B : 28911 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 1.319KB : 16851 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
< 7.999KB : 1397 ▇▇
< 48.503KB : 831 ▇
< 294.066KB: 34 ▇
< 1.741MB : 7 ▇
< 10.556MB : 0
<=64.000MB : 23 ▇
--------------------------------------------
max: 64.000MB
📂 Allocator type distribution:
MALLOC: 71282
CALLOC: 3149
REALLOC: 838
MMAP: 50
🥇 Top 5 largest allocating locations (by size):
- <stack trace unavailable> -> 1.378GB
- _call_with_frames_removed:<frozen importlib._bootstrap>:488
-> 84.912MB
- dedent:/usr/local/lib/python3.12/textwrap.py:436 -> 4.572MB
- sub:/usr/local/lib/python3.12/re/__init__.py:186 -> 3.808MB
- dedent:/usr/local/lib/python3.12/textwrap.py:435 -> 1.117MB
🥇 Top 5 largest allocating locations (by number of allocations):
- <stack trace unavailable> -> 29539
- _call_with_frames_removed:<frozen importlib._bootstrap>:488
-> 17261
- <module>:test.py:3 -> 5130
- sub:/usr/local/lib/python3.12/re/__init__.py:186 -> 4586
- dedent:/usr/local/lib/python3.12/textwrap.py:436 -> 4382
Changing to pyarrow==17.0.0
📦 Total memory allocated:
219.388MB
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]