Hi, devs, I'd like to start a discuss about adding an option called "table.oprimizer.busy-join-reorder-threshold" for planner rule while we try to introduce a new busy join reorder rule[1] into Flink.
This join reorder rule is based on dynamic programing[2], which can store all possible intermediate results, and the cost model can be used to select the optimal join reorder result. Compare with the existing Lopt join reorder rule, the new rule can give more possible results and the result can be more accurate. However, the search space of this rule will become very large as the number of tables increases. So we should introduce an option to limit the expansion of search space, if the number of table can be reordered less than the threshold, the new busy join reorder rule is used. On the contrary, the Lopt rule is used. The default threshold intended to be set to 12. One reason is that in the tpc-ds benchmark test, when the number of tables exceeds 12, the optimization time will be very long. The other reason is that it refers to relevant engines, like Spark, whose recommended setting is 12.[3] Looking forward to your feedback. [1] https://issues.apache.org/jira/browse/FLINK-30376 [2] https://courses.cs.duke.edu/compsci516/cps216/spring03/papers/selinger-etal-1979.pdf [3] https://spark.apache.org/docs/3.3.1/configuration.html#runtime-sql-configuration Best regards, Yunhong Zheng