[GitHub] [arrow] yli1994 commented on issue #14726: pq.read_table("parquet files path", memory_map=True) still consume large memory space(200G file cost 200G memory and slow)

GitBox Thu, 01 Dec 2022 18:35:47 -0800


yli1994 commented on issue #14726:
URL: https://github.com/apache/arrow/issues/14726#issuecomment-1334688842


   > If you want to reduce memory usage when reading a file, you should not 
read it as an entire table, but as a sequence of batches. See here: 
https://arrow.apache.org/docs/python/parquet.html#finer-grained-reading-and-writing
   
   Thank you for your reply! I am confused how could Huggingface's datasets 
library (which uses pyarrow as backend and parquet as file format) load data 
without increasing memory consumption


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] yli1994 commented on issue #14726: pq.read_table("parquet files path", memory_map=True) still consume large memory space(200G file cost 200G memory and slow)

Reply via email to