Hi Jonathan,

if you are using custom RelOptCost with custom cost model, you can apply your cost model to the logical nodes as well.

1. You need to initialize the planner with your implementation of the RelOptCostFactory

2. You can also override the default cost formula for any node (logical or physical) using this metadata handler [1]

[1] https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/main/java/org/apache/calcite/rel/metadata/RelMdPercentageOriginalRows.java#L186

Thanks.

Roman.

On 07.07.2023 02:46, Jonathan Sternberg wrote:
Thanks.

For MultiJoin, I'm trying to get it to work with a custom cost model
and custom output convention. We have our costs for operations as part of
the implementation of the physical nodes. Since MultiJoin uses the costs to
determine the join ordering, I'm a bit concerned that it is using the costs
from the logical plans rather than our custom ones. Is it possible to
utilize the physical nodes with MultiJoin or does it have to be utilized
with logical nodes only?

--Jonathan Sternberg

On Tue, Jul 4, 2023 at 1:43 AM Roman Kondakov <kondako...@mail.ru.invalid>
wrote:

Hi Jonathan,

1. As Julian mentioned, it's better to use heuristic join order for
large amount of joins

2. LoptOptimizeJoinRule and MultiJoinOptimizeBushyRule AFAIK always
produce tree of joins, not a MultiJoin.

3. Yes, your understanding is correct. You can check the default join
order program [1]

[1]

https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/main/java/org/apache/calcite/tools/Programs.java#L186

Thanks,

Roman.

On 03.07.2023 22:48, Julian Hyde wrote:
The reason that there are two strategies is because of large joins. If
your query joins 10 tables, the number of possible join orders is large
(bounded by 10 factorial I believe) and therefore would overwhelm the
Volcano planner, which must construct each possibility.
Therefore we have a heuristic algorithm that you should use for large
joins. We gather the entire FROM clause into a data structure called
MultiJoin, and a single rule call applies heuristics and spits out a join
order that is probably close to optimal.
When you are optimizing a query, you need to know whether you are in
danger of being swallowed by the monster that is the complexity of large
joins. If your query only joins 2 or 3 tables (and in some other situations
too) you are not in danger and can safely exhaustively enumerate plans.
On Jul 3, 2023, at 7:58 AM, Jonathan Sternberg <jonat...@bodo.ai>
wrote:
Hi,

I'm presently working on optimizing the ordering of joins for queries
and
had a few questions about the optimal way to do that with Calcite.

I watched this meetup video (
https://www.youtube.com/watch?v=5wQojihyJDs)
and spent some time experimenting with JoinAssociateRule,
JoinCommuteRule,
and the rules related to MultiJoins. We're utilizing the volcano planner
for optimization at the present moment but also have the freedom to
customize the order and phases for the planner phases.

1. Is MultiJoin generally suggested over JoinAssociate and JoinCommute
rules? Or are JoinAssociate and JoinCommute still recommended as the
standard way to handle reordering of joins?
2. Our system only supports performing the join over two inputs and we
can't support MultiJoin as a physical operation. My understanding is
that
the LoptOptimizeJoinRule and MultiJoinOptimizeBushyRule will rearrange
the
join but will still produce a MultiJoin. What's the appropriate way to
convert a MultiJoin back to a set of joins?
3. My understanding is that MultiJoin rules aren't compatible with the
volcano planner and should be run as part of a stage using the heuristic
planner. Is this understanding correct?

Thank you for any help.

--Jonathan Sternberg

Reply via email to