adriangb commented on issue #18840: URL: https://github.com/apache/datafusion/issues/18840#issuecomment-3557117652
I'm definitely interested in "dynamic" adjustment of plans. I think it's a very interesting area of optimization. I *think* I remember hearing that BigQuery relies heavily on this approach. For joins specifically I thought we could maybe even do something like pull 1% / 5MB / 3 batches / 30k rows (making up some heuristics) on each side and then decide if we got the sizes wrong, maybe adjusting with the existing data or otherwise restarting the whole thing. The reason I find this compelling is that: 1. If we have good estimates it's a no-op. 2. If we're way off 1% / 5MB / 3 batches / 30k rows should tell us all we need to know and is a ~ constant amount of work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
