Hi Yunhong, Thanks for driving this~
I haven't gone deep into the implementation details yet. Regarding the general description, I would ask a few questions firstly: #1, Is there any benchmark results about the optimization latency change compared to current approach? In OLAP scenario, query optimization latency is more crucial. #2, About the term "busy join reorder", is there any others systems which also use this term? I know Calcite has a rule[1] which uses the term "bushy join". #3, About the implementation, if this does the same work as Calcite MultiJoinOptimizeBushyRule, is it possible to use the Calcite version directly or extend it in some way? [1] https://github.com/apache/calcite/blob/9054682145727fbf8a13e3c79b3512be41574349/core/src/main/java/org/apache/calcite/rel/rules/MultiJoinOptimizeBushyRule.java#L78 yh z <zhengyunhon...@gmail.com> 于2022年12月29日周四 14:44写道: > Hi, devs, > > I'd like to start a discuss about adding an option called > "table.oprimizer.busy-join-reorder-threshold" for planner rule while we try > to introduce a new busy join reorder rule[1] into Flink. > > This join reorder rule is based on dynamic programing[2], which can store > all possible intermediate results, and the cost model can be used to select > the optimal join reorder result. Compare with the existing Lopt join > reorder rule, the new rule can give more possible results and the result > can be more accurate. However, the search space of this rule will become > very large as the number of tables increases. So we should introduce an > option to limit the expansion of search space, if the number of table can > be reordered less than the threshold, the new busy join reorder rule is > used. On the contrary, the Lopt rule is used. > > The default threshold intended to be set to 12. One reason is that in the > tpc-ds benchmark test, when the number of tables exceeds 12, the > optimization time will be very long. The other reason is that it refers to > relevant engines, like Spark, whose recommended setting is 12.[3] > > Looking forward to your feedback. > > [1] https://issues.apache.org/jira/browse/FLINK-30376 > [2] > > https://courses.cs.duke.edu/compsci516/cps216/spring03/papers/selinger-etal-1979.pdf > [3] > > https://spark.apache.org/docs/3.3.1/configuration.html#runtime-sql-configuration > > Best regards, > Yunhong Zheng > -- Best, Benchao Li