Re: [I] Dynamic hash join order switching [datafusion]

via GitHub Thu, 20 Nov 2025 02:25:14 -0800


adriangb commented on issue #18840:
URL: https://github.com/apache/datafusion/issues/18840#issuecomment-3557117652


   I'm definitely interested in "dynamic" adjustment of plans. I think it's a 
very interesting area of optimization. I *think* I remember hearing that 
BigQuery relies heavily on this approach.
   
   For joins specifically I thought we could maybe even do something like pull 
1% / 5MB / 3 batches / 30k rows (making up some heuristics) on each side and 
then decide if we got the sizes wrong, maybe adjusting with the existing data 
or otherwise restarting the whole thing. The reason I find this compelling is 
that:
   1. If we have good estimates it's a no-op.
   2. If we're way off 1% / 5MB / 3 batches / 30k rows should tell us all we 
need to know and is a ~ constant amount of work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Dynamic hash join order switching [datafusion]

Reply via email to