adriangb commented on PR #6999: URL: https://github.com/apache/arrow-rs/pull/6999#issuecomment-2602456064
As a side note I think one of the biggest bottlenecks in systems working from object storage tends to be latency, so it's important to minimize latency (this is well known, including in the comments/docstrings in this file). Would it be beneficial to have the right APIs to make it possible to pre-fetch the entire file? E.g. if I'm going to load a <1MB parquet file I might want to just make a single request to object storage and know I have everything I need instead of loading the metadata, then making another request to load the data. This would especially be beneficial for the scenario where you don't know the metadata size but maybe know the file size, then you do 1 request instead of potentially 3+. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
