Hi, Vladimir. Good catch! There could be some improvements here.

Actually, this problem was discovered early when the top-down rule driver
was designed. At that time, no rule was annotated as "TransformationRule".
Moreover, it is impossible to ask every calcite user who designed their own
rules to annotate the existing code. So the top-down rule driver was
designed so that it can:
1. Work in chaos: even if there are no hints for rule types, it can still
work. Some opportunities may be lost, but NO failures, NO exceptions, and
NO worse than the original driver. People can migrate to the new driver
without concern.
2. Be Improvable: Users can refactor their code, if they want, step by
step. As rule types become more and more accurate, the system achieves more
and more benefits
3. Be easy to customize: the default implementation is designed for the
most common cases, so that most users can benefit from it without much
effort. But it is not possible to fulfill all requirements as different
systems could have very different patterns to define logical and
physical. That's why the isTransformationRule method is put in
VolcanoPlanner and marked as protected: overwriting it can be very simple.

Moreover, losing some "derive" opportunities is not as serious as
imagination. As I mentioned in previous discussions, parents are in charge
of raising as many requirements as possible. During "derive", if specific
traits were not built by children, it means that no parents were requiring
that. And if parents finally require that traits in the latter
optimization, passThrough methods get called and new physical nodes are
generated and "derive" get called again.
I tested it on millions of queries, with or without correct rule types, in
my own product. The performance of group pruning varies a lot. But the
output plans are almost the same. Only one obvious exception was
discovered: the spool node. That's because spool nodes cannot "passThough"
parent traits (it could have multiple parents and current framework cannot
handle such a situation) while it can "derive" input traits.

Of course, this conclusion may not apply to your product as we could have
quite different rule sets. I am just sharing some of my experiences. Maybe
the current implementation of "isTransformationRule" is not good enough. If
you have any better solutions, please share them.

Thanks,
Jinpeng Wu

On Fri, May 28, 2021 at 7:10 PM Vladimir Ozerov <ppoze...@gmail.com> wrote:

> Hi,
>
> I have an optimizer that uses top-down VolcanoPlanner and has a
> ConverterRule for every LogicalNode. I have a new requirement when one of
> the physical rules must emit several physical nodes instead of one. I tried
> to convert a ConverterRule to a normal rule that emits physical nodes. Even
> though the new rule has exactly the same pattern and logic as the previous
> ConverterRule, plans changed. Analysis showed that this likely due to a bug
> in the top-down optimizer as explained below.
>
> When optimizing a logical node, the top-down first schedules the
> transformation rules, and then implementation rules. The logic to check
> whether the rule is transformation rule or not is located in
> VolcanoPlanner.isTransformationRule [1]. The rule scheduling logic ensures
> that a given rule is executed either as a transformation rule, or an
> implementation rule, but not both. See TopDowRuleQueue.popMatch. The
> top-down optimizer schedules tasks in a stack. So even though the
> transformation rules are scheduled before implementation rules, the latter
> executed first.
>
> If an implementation rule produces a physical node, this node will be
> notified about input traits in the "derive" phase. In contrast,
> transformation rules produce logical nodes only, and this happens after the
> derivation of the inputs is completed. Therefore, if the top-down optimizer
> mistakenly treats an implementation rule as a transformation rule, "derive"
> will not be called on the produced physical nodes, leading to incomplete
> search space exploration.
>
> It seems, that this is exactly what happens in the current implementation.
> The VolcanoPlanner.isTransformationRule looks like this:
>
>  1: protected boolean isTransformationRule(VolcanoRuleCall match) {
>  2:    if (match.getRule() instanceof SubstitutionRule) {
>  3:      return true;
>  4:    }
>  5:    if (match.getRule() instanceof ConverterRule
>  6:        && match.getRule().getOutTrait() == rootConvention) {
>  7:      return false;
>  8:    }
>  9:    return match.getRule().getOperand().trait == Convention.NONE
> 10:        || match.getRule().getOperand().trait == null;
> 11: }
>
> If the rule is a ConverterRule and it produces the node with the target
> convention, it is treated as an implementation rule (lines 5-6). But if the
> rule is not a ConverterRule, the method tries to deduce the rule's type
> from the incoming convention (lines 9-10). In practice, implementation
> rules either do not care about the incoming trait or expect the NONE trait.
> Therefore, it seems that currently, the top-down optimizer treats many
> implementation rules as physical rules, and as a result, cannot notify
> physical nodes produced from these rules about trait derivation.
>
> This explains why in my case everything was ok when all implementation
> rules were ConverterRules, and why I lost some optimal plans when the rule
> was refactored to a non-converter variant.
>
> Do you agree that this a bug? If yes, shouldn't we refactor that code to
> just check whether the rule is an instance of TransformationRule? Since
> this is a breaking change, we may add a special flag that preserves the old
> behavior by default but allows for the new behavior to overcome the
> aforementioned problem.
>
> Regards,
> Vladimir.
>
> [1]
>
> https://github.com/apache/calcite/blob/calcite-1.26.0/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L1398-L1408
>

Reply via email to