Hi Jonathan,
1. As Julian mentioned, it's better to use heuristic join order for
large amount of joins
2. LoptOptimizeJoinRule and MultiJoinOptimizeBushyRule AFAIK always
produce tree of joins, not a MultiJoin.
3. Yes, your understanding is correct. You can check the default join
order program [1]
[1]
https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/main/java/org/apache/calcite/tools/Programs.java#L186
Thanks,
Roman.
On 03.07.2023 22:48, Julian Hyde wrote:
The reason that there are two strategies is because of large joins. If your
query joins 10 tables, the number of possible join orders is large (bounded by
10 factorial I believe) and therefore would overwhelm the Volcano planner,
which must construct each possibility.
Therefore we have a heuristic algorithm that you should use for large joins. We
gather the entire FROM clause into a data structure called MultiJoin, and a
single rule call applies heuristics and spits out a join order that is probably
close to optimal.
When you are optimizing a query, you need to know whether you are in danger of
being swallowed by the monster that is the complexity of large joins. If your
query only joins 2 or 3 tables (and in some other situations too) you are not
in danger and can safely exhaustively enumerate plans.
On Jul 3, 2023, at 7:58 AM, Jonathan Sternberg <jonat...@bodo.ai> wrote:
Hi,
I'm presently working on optimizing the ordering of joins for queries and
had a few questions about the optimal way to do that with Calcite.
I watched this meetup video (https://www.youtube.com/watch?v=5wQojihyJDs)
and spent some time experimenting with JoinAssociateRule, JoinCommuteRule,
and the rules related to MultiJoins. We're utilizing the volcano planner
for optimization at the present moment but also have the freedom to
customize the order and phases for the planner phases.
1. Is MultiJoin generally suggested over JoinAssociate and JoinCommute
rules? Or are JoinAssociate and JoinCommute still recommended as the
standard way to handle reordering of joins?
2. Our system only supports performing the join over two inputs and we
can't support MultiJoin as a physical operation. My understanding is that
the LoptOptimizeJoinRule and MultiJoinOptimizeBushyRule will rearrange the
join but will still produce a MultiJoin. What's the appropriate way to
convert a MultiJoin back to a set of joins?
3. My understanding is that MultiJoin rules aren't compatible with the
volcano planner and should be run as part of a stage using the heuristic
planner. Is this understanding correct?
Thank you for any help.
--Jonathan Sternberg