Thanks.

For MultiJoin, I'm trying to get it to work with a custom cost model
and custom output convention. We have our costs for operations as part of
the implementation of the physical nodes. Since MultiJoin uses the costs to
determine the join ordering, I'm a bit concerned that it is using the costs
from the logical plans rather than our custom ones. Is it possible to
utilize the physical nodes with MultiJoin or does it have to be utilized
with logical nodes only?

--Jonathan Sternberg

On Tue, Jul 4, 2023 at 1:43 AM Roman Kondakov <kondako...@mail.ru.invalid>
wrote:

> Hi Jonathan,
>
> 1. As Julian mentioned, it's better to use heuristic join order for
> large amount of joins
>
> 2. LoptOptimizeJoinRule and MultiJoinOptimizeBushyRule AFAIK always
> produce tree of joins, not a MultiJoin.
>
> 3. Yes, your understanding is correct. You can check the default join
> order program [1]
>
> [1]
>
> https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/main/java/org/apache/calcite/tools/Programs.java#L186
>
> Thanks,
>
> Roman.
>
> On 03.07.2023 22:48, Julian Hyde wrote:
> > The reason that there are two strategies is because of large joins. If
> your query joins 10 tables, the number of possible join orders is large
> (bounded by 10 factorial I believe) and therefore would overwhelm the
> Volcano planner, which must construct each possibility.
> >
> > Therefore we have a heuristic algorithm that you should use for large
> joins. We gather the entire FROM clause into a data structure called
> MultiJoin, and a single rule call applies heuristics and spits out a join
> order that is probably close to optimal.
> >
> > When you are optimizing a query, you need to know whether you are in
> danger of being swallowed by the monster that is the complexity of large
> joins. If your query only joins 2 or 3 tables (and in some other situations
> too) you are not in danger and can safely exhaustively enumerate plans.
> >
> >> On Jul 3, 2023, at 7:58 AM, Jonathan Sternberg <jonat...@bodo.ai>
> wrote:
> >>
> >> Hi,
> >>
> >> I'm presently working on optimizing the ordering of joins for queries
> and
> >> had a few questions about the optimal way to do that with Calcite.
> >>
> >> I watched this meetup video (
> https://www.youtube.com/watch?v=5wQojihyJDs)
> >> and spent some time experimenting with JoinAssociateRule,
> JoinCommuteRule,
> >> and the rules related to MultiJoins. We're utilizing the volcano planner
> >> for optimization at the present moment but also have the freedom to
> >> customize the order and phases for the planner phases.
> >>
> >> 1. Is MultiJoin generally suggested over JoinAssociate and JoinCommute
> >> rules? Or are JoinAssociate and JoinCommute still recommended as the
> >> standard way to handle reordering of joins?
> >> 2. Our system only supports performing the join over two inputs and we
> >> can't support MultiJoin as a physical operation. My understanding is
> that
> >> the LoptOptimizeJoinRule and MultiJoinOptimizeBushyRule will rearrange
> the
> >> join but will still produce a MultiJoin. What's the appropriate way to
> >> convert a MultiJoin back to a set of joins?
> >> 3. My understanding is that MultiJoin rules aren't compatible with the
> >> volcano planner and should be run as part of a stage using the heuristic
> >> planner. Is this understanding correct?
> >>
> >> Thank you for any help.
> >>
> >> --Jonathan Sternberg
>

Reply via email to