tustvold commented on issue #7129: URL: https://github.com/apache/arrow-rs/issues/7129#issuecomment-2656099025
ParquetRecordBatchStreamBuilder reads at most one row group at a time. Perhaps you could use the parquet-layout binary to confirm the parquet file's layout. _In general it is inadvisable to construct a single massive RecordBatch, instead processing data a batch at a time in a streaming fashion. This will require less memory, and perform better_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
