Re: [I] [C++] Metadata related memory leak when reading parquet dataset [arrow]

via GitHub Wed, 22 Jan 2025 01:55:16 -0800


pitrou commented on issue #45287:
URL: https://github.com/apache/arrow/issues/45287#issuecomment-2606773232


   > Yeah this is a extreme case just to show the repro. In practice the file 
has a couple thousands row per file.
   
   How many row groups per file (or rows per row group)? It turns out much of 
the Parquet metadata consumption is in ColumnChunk entries. A 
Thrift-deserialized ColumnChunk is 640 bytes long, and there are O(C*R*F) 
ColumnChunks in your dataset, with C=number_columns, 
R=number_row_groups_per_file and F=number_files.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [C++] Metadata related memory leak when reading parquet dataset [arrow]

Reply via email to