Re: [I] A 175M DataFrame saved to parquet requires 1G of memory to be read [arrow]

via GitHub Fri, 13 Oct 2023 07:01:34 -0700


mapleFU commented on issue #38245:
URL: https://github.com/apache/arrow/issues/38245#issuecomment-1761548949


   @rdbisme Not all 170MB file would consuming so many memory, but your case 
seems matches it well.
   
   You can considering a file size for each column is `k` MiB. And after 
compresion, it might become `0.5k` MiB. And the footer size might be `0.05k` MiB
   
   And now, when reading, because the file only has one page for each column. 
It have to:
   1. Read the data, which might cause `0.55k` MiB for the whole data
   2. Decompress the page, it might cause `1k` MiB of decompressed data
   3. Decode them to arrow, which might causing another batch of `1k` MiB


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] A 175M DataFrame saved to parquet requires 1G of memory to be read [arrow]

Reply via email to