Re: [DISCUSS] Adding a option for planner to decide which join reorder rule to choose

Benchao Li Mon, 02 Jan 2023 20:53:59 -0800

Hi Yunhong,

Thanks for driving this~


I haven't gone deep into the implementation details yet. Regarding the
general description, I would ask a few questions firstly:

#1, Is there any benchmark results about the optimization latency change
compared to current approach? In OLAP scenario, query optimization latency
is more crucial.

#2, About the term "busy join reorder", is there any others systems which
also use this term? I know Calcite has a rule[1] which uses the term "bushy
join".

#3, About the implementation, if this does the same work as Calcite
MultiJoinOptimizeBushyRule, is it possible to use the Calcite version
directly or extend it in some way?

[1]
https://github.com/apache/calcite/blob/9054682145727fbf8a13e3c79b3512be41574349/core/src/main/java/org/apache/calcite/rel/rules/MultiJoinOptimizeBushyRule.java#L78

yh z <zhengyunhon...@gmail.com> 于2022年12月29日周四 14:44写道：

> Hi, devs,
>
> I'd like to start a discuss about adding an option called
> "table.oprimizer.busy-join-reorder-threshold" for planner rule while we try
> to introduce a new busy join reorder rule[1] into Flink.
>
> This join reorder rule is based on dynamic programing[2], which can store
> all possible intermediate results, and the cost model can be used to select
> the optimal join reorder result. Compare with the existing Lopt join
> reorder rule, the new rule can give more possible results and the result
> can be more accurate. However, the search space of this rule will become
> very large as the number of tables increases. So we should introduce an
> option to limit the expansion of search space, if the number of table can
> be reordered less than the threshold, the new busy join reorder rule is
> used. On the contrary, the Lopt rule is used.
>
> The default threshold intended to be set to 12. One reason is that in the
> tpc-ds benchmark test, when the number of tables exceeds 12, the
> optimization time will be very long. The other reason is that it refers to
> relevant engines, like Spark, whose recommended setting is 12.[3]
>
> Looking forward to your feedback.
>
> [1]  https://issues.apache.org/jira/browse/FLINK-30376
> [2]
>
> https://courses.cs.duke.edu/compsci516/cps216/spring03/papers/selinger-etal-1979.pdf
> [3]
>
> https://spark.apache.org/docs/3.3.1/configuration.html#runtime-sql-configuration
>
> Best regards,
> Yunhong Zheng
>


-- 

Best,
Benchao Li

Re: [DISCUSS] Adding a option for planner to decide which join reorder rule to choose

Reply via email to