jonded94 commented on issue #9423:
URL: https://github.com/apache/arrow-rs/issues/9423#issuecomment-4266909216

   Hey, just wanted to share that I just came back from PyConDE 2026, and our 
talk was very well received 🥳 I was approached by people telling me they love 
to hear about Rust, especially for Python interop ❤️ We sadly we're the only 
talk about Rust at this year's PyCon..
   
   Eventually, there will be recordings (potentially even on YouTube); until 
then, you can have a look 
[here](https://pretalx.com/pyconde-pydata-2026/talk/F79RG9/) for some 
high-level description of the talk and slides are available 
[here](https://pretalx.com/media/pyconde-pydata-2026/submissions/F79RG9/resources/PyCon__7PA2LWP.pdf).
   
   We also included a slide with a comparison of `pyarrow`, `arro3` and 
`disco-parquet` for object store reads. As our library is heavily optimized for 
minimal memory load [1] and we're entirely single-threaded, we are strongly 
beaten by `pyarrow` on throughput. But our memory load is far less than both 
`pyarrow` and `arro3`, but in `arro3`'s case, I think this is only because it 
actually eagerly collects an entire file (?) from object store into RAM and 
converts it into a `Table` before serving rows from it: 
https://github.com/kylebarron/arro3/blob/4cf69f475bba07a6eec098b8351057ea15be0c62/arro3-io/src/parquet.rs#L77
   
   As a comparison, we just use [OpenDAL](https://crates.io/crates/opendal) and 
the existing [Parquet<->OpenDAL 
integration](https://crates.io/crates/parquet_opendal) to have a little bit of 
a better memory footprint. This should be doable in `arro3` too.
   
   [1] this comes out of a requirement of our distributed largest scale model 
trainings, where per dataset we need to use as little memory as possible, to 
not let GPU workers go OOM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to