Hi Jonathan,

1. As Julian mentioned, it's better to use heuristic join order for large amount of joins

2. LoptOptimizeJoinRule and MultiJoinOptimizeBushyRule AFAIK always produce tree of joins, not a MultiJoin.

3. Yes, your understanding is correct. You can check the default join order program [1]

[1] https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/main/java/org/apache/calcite/tools/Programs.java#L186

Thanks,

Roman.

On 03.07.2023 22:48, Julian Hyde wrote:
The reason that there are two strategies is because of large joins. If your 
query joins 10 tables, the number of possible join orders is large (bounded by 
10 factorial I believe) and therefore would overwhelm the Volcano planner, 
which must construct each possibility.

Therefore we have a heuristic algorithm that you should use for large joins. We 
gather the entire FROM clause into a data structure called MultiJoin, and a 
single rule call applies heuristics and spits out a join order that is probably 
close to optimal.

When you are optimizing a query, you need to know whether you are in danger of 
being swallowed by the monster that is the complexity of large joins. If your 
query only joins 2 or 3 tables (and in some other situations too) you are not 
in danger and can safely exhaustively enumerate plans.

On Jul 3, 2023, at 7:58 AM, Jonathan Sternberg <jonat...@bodo.ai> wrote:

Hi,

I'm presently working on optimizing the ordering of joins for queries and
had a few questions about the optimal way to do that with Calcite.

I watched this meetup video (https://www.youtube.com/watch?v=5wQojihyJDs)
and spent some time experimenting with JoinAssociateRule, JoinCommuteRule,
and the rules related to MultiJoins. We're utilizing the volcano planner
for optimization at the present moment but also have the freedom to
customize the order and phases for the planner phases.

1. Is MultiJoin generally suggested over JoinAssociate and JoinCommute
rules? Or are JoinAssociate and JoinCommute still recommended as the
standard way to handle reordering of joins?
2. Our system only supports performing the join over two inputs and we
can't support MultiJoin as a physical operation. My understanding is that
the LoptOptimizeJoinRule and MultiJoinOptimizeBushyRule will rearrange the
join but will still produce a MultiJoin. What's the appropriate way to
convert a MultiJoin back to a set of joins?
3. My understanding is that MultiJoin rules aren't compatible with the
volcano planner and should be run as part of a stage using the heuristic
planner. Is this understanding correct?

Thank you for any help.

--Jonathan Sternberg

Reply via email to