[GitHub] [arrow] alamb commented on pull request #8283: ARROW-9707: [Rust] [DataFusion] DataFusion Scheduler Prototype [WIP]

GitBox Mon, 28 Sep 2020 11:12:17 -0700


alamb commented on pull request #8283:
URL: https://github.com/apache/arrow/pull/8283#issuecomment-700197576



   > When I run the TPC-H query I am testing against a data set that has 240 
Parquet files. If we just try and run everything at once with async/await and 
have tokio do the scheduling, we will end up with 240 files open at once with 
reads happening against all of them, which is inefficient.
   
   One way to avoid this type of resource usage explosion is if the Parquet 
reader itself limits the number of outstanding `Task`s that it submits. For 
example, with a tokio channel or something.
   
   It seems to me the challenge is not really "scheduling" per se, but more 
"resource allocation"


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] alamb commented on pull request #8283: ARROW-9707: [Rust] [DataFusion] DataFusion Scheduler Prototype [WIP]

Reply via email to