kylebarron commented on issue #9423: URL: https://github.com/apache/arrow-rs/issues/9423#issuecomment-4269176426
> in `arro3`'s case, I think this is only because it actually eagerly collects an entire file (?) from object store into RAM and converts it into a `Table` before serving rows from it: [kylebarron/arro3@`4cf69f4`/arro3-io/src/parquet.rs#L77](https://github.com/kylebarron/arro3/blob/4cf69f475bba07a6eec098b8351057ea15be0c62/arro3-io/src/parquet.rs#L77) Yes, I _think_ that's accurate. I spent the most time on the core Arrow classes in `arro3-core` and never spent _that_ much time on trying to make Parquet loading efficient. And that was before I learned (in https://github.com/developmentseed/obstore) how to expose a Rust async stream to Python as an async iterator. We could refactor the `arro3-io` reader to expose an async stream of Arrow record batches without too much difficulty. I didĀ also prototype an async Parquet API similar to pyarrow's https://github.com/kylebarron/arro3/pull/313, but I never stabilized it enough to merge it. If either of these interest you, happy to discuss more on arro3 issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
