alamb commented on issue #16106: URL: https://github.com/apache/datafusion/issues/16106#issuecomment-2906792842
Thanks @aditanase -- in general I would classify this under the category of the desire for a more sophisticated join reordering algorithm. I am pretty skeptical that we will be able to find such an algorithm that would work well for all cases and thus that belongs in DataFusion's core. The theory is that people can use DataFuson's extension APIs to get whatever join orders they want > adding some sorf ot join hints in the SQL planner [like we have in spark](https://downloads.apache.org/spark/docs/3.0.0/sql-ref-syntax-qry-select-hints.html#join-hints) I have two potential suggestions: # Idea 1: Semantic Optimizer One thing maybe you can do is use the fact that DataFusion doesn't typically reorder joins (normally it plans the joins in the order they are listed syntactically in the query. This is the ultimate form of join hinting. I expect DataFusion to plan this with `a` as the left input and `b` as the right input ```sql SELECT .. a JOIN b ... ``` Likewise, I expect DataFusion to plan this with `b` as the left input and `a` as the right input ```sql SELECT .. b JOIN a ... ``` If the built in optimizer passes can't be disabled now, we should add some config setting to do so # Idea 2: Custom optimizer ANother thing that you could do is add a custom optimizer rule that implements the heuristics you describe (e.g. join hints, FK/PK constraints, etc) I wrote about this design choice / limitation here: - https://www.influxdata.com/blog/optimizing-sql-dataframes-part-two/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org