darmie commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-4023555034
> > I see your point and agree. I think metrics based and parallelism fixes(biased towards morsel parallelism as per Hyper), they would be of great advantage and should still be pursued even if the heuristics approach is merged for quick wins. > > And to be clear, I think DataFusion already has "morsel driven parallelism" for computation (aka the batches that flow through the plan) > What @Dandandan has done is to start working on making the Parquet IO scheduling more fine grained (and adaptively adjust at runtime to data skew) Thanks for the clarification! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
