alamb commented on issue #7983: URL: https://github.com/apache/arrow-rs/issues/7983#issuecomment-3165363332
> I'm not sure I understand why this model isn't possible with the pull-based reader? I could implement an [AsyncFileReader](https://docs.rs/parquet/latest/parquet/arrow/async_reader/trait.AsyncFileReader.html) ... The thing you can't do with current parquet pull decoder is known what IO requests will be coming *next* -- so basically when the pull decoder asks you for more data, it needs the bytes to make any more progress -- and thus your decoding stalls until you feed the bytes in To have effective pre-fetching, you need to know what ranges are going to be needed *before* the reader needs them So in the arrow-rs parquet case, for example, this might mean as you are reading one row group, calculate the ranges to fetch from object store for the *next* row group. Right now, the decoder won't tell you this information until it actually tries to read the next row group -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
