adriangb commented on PR #21666: URL: https://github.com/apache/datafusion/pull/21666#issuecomment-4260535615
Yes. And I think this was another bandaid. But it's closer to the root cause than previous attempts. This has to do with cancellation when multiple joins are involved. TLDR I think what is happening is when you have multiple joins you end up with a tree of operators. One of the joins up higher in the tree hits the new optimization and aborts work, dropping tasks that would have polled downstream joins. But not the downstream join is stuck waiting for all of it's partition tasks to finish even though they never will. I think we were all operating under the assumption that the issue was within a single join operator but really it's an issue any time an upstream operator cancels on a join. I think the real solution is to track when a join build partition task gets dropped and report that to the dynamic filter building so that it doesn't wait for that partition to report. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
