yli1994 commented on issue #14726: URL: https://github.com/apache/arrow/issues/14726#issuecomment-1334688842
> If you want to reduce memory usage when reading a file, you should not read it as an entire table, but as a sequence of batches. See here: https://arrow.apache.org/docs/python/parquet.html#finer-grained-reading-and-writing Thank you for your reply! I am confused how could Huggingface's datasets library (which uses pyarrow as backend and parquet as file format) load data without increasing memory consumption -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
