Re: Top-down optimizer cannot explore the search space because physical rule is treated as transformation rule

Vladimir Ozerov Fri, 28 May 2021 10:41:35 -0700

BTW, I tried to change the implementation to:

 1: protected boolean isTransformationRule(VolcanoRuleCall match) {
 2:    return match.getRule() instanceof TransformationRule;
 3: }


It solved my problem - plans returned to normal. In the Apache Calcite
repo, only 4 tests in the TopDowOptTest class failed due to a minor
operator reordering.

пт, 28 мая 2021 г. в 20:37, Vladimir Ozerov <ppoze...@gmail.com>:

> Hi Jinpeng,
>
> Thank you for the clarification. When I saw the code in question for the
> first time, my first thought was that it was perhaps designed for gradual
> migration. The main problem is that the current implementation discards
> parts of the plan *silently*, which might be difficult to spot. I
> only spotted the problem in my specific case because I had ~100 tests with
> complex queries. Otherwise, I would happily proceed with the new rule
> without knowing that I lost important parts of the search space.
>
> That said, I think we can do the following:
>
>    1. Emit a warning if or even throw an exception if the transformation
>    rule produced a physical node. This should be trivial to implement - add an
>    expected node type to VolcanoRuleCall (e.g., "logical", "physical", "any").
>    The warning/exception should contain a proper fix suggestion - to override
>    the VolcanoPlanner.isTransformationRule.
>    2. Alternatively - do a breaking change. Apache Calcite doesn't have a
>    major release cadence. It is normal practice in many products to do
>    breaking changes in minor releases. Even popular products like Mongo or
>    DataStax do it regularly. We may inform the user in the first release and
>    change to "rule instanceof TransformationRule" in the next release.
>
> Does it make sense?
>
> Regards,
> Vladimir.
>
> пт, 28 мая 2021 г. в 19:33, Jinpeng Wu <wjpabc...@gmail.com>:
>
>> Hi, Vladimir. Good catch! There could be some improvements here.
>>
>> Actually, this problem was discovered early when the top-down rule driver
>> was designed. At that time, no rule was annotated as "TransformationRule".
>> Moreover, it is impossible to ask every calcite user who designed their
>> own
>> rules to annotate the existing code. So the top-down rule driver was
>> designed so that it can:
>> 1. Work in chaos: even if there are no hints for rule types, it can still
>> work. Some opportunities may be lost, but NO failures, NO exceptions, and
>> NO worse than the original driver. People can migrate to the new driver
>> without concern.
>> 2. Be Improvable: Users can refactor their code, if they want, step by
>> step. As rule types become more and more accurate, the system achieves
>> more
>> and more benefits
>> 3. Be easy to customize: the default implementation is designed for the
>> most common cases, so that most users can benefit from it without much
>> effort. But it is not possible to fulfill all requirements as different
>> systems could have very different patterns to define logical and
>> physical. That's why the isTransformationRule method is put in
>> VolcanoPlanner and marked as protected: overwriting it can be very simple.
>>
>> Moreover, losing some "derive" opportunities is not as serious as
>> imagination. As I mentioned in previous discussions, parents are in charge
>> of raising as many requirements as possible. During "derive", if specific
>> traits were not built by children, it means that no parents were requiring
>> that. And if parents finally require that traits in the latter
>> optimization, passThrough methods get called and new physical nodes are
>> generated and "derive" get called again.
>> I tested it on millions of queries, with or without correct rule types, in
>> my own product. The performance of group pruning varies a lot. But the
>> output plans are almost the same. Only one obvious exception was
>> discovered: the spool node. That's because spool nodes cannot "passThough"
>> parent traits (it could have multiple parents and current framework cannot
>> handle such a situation) while it can "derive" input traits.
>>
>> Of course, this conclusion may not apply to your product as we could have
>> quite different rule sets. I am just sharing some of my experiences. Maybe
>> the current implementation of "isTransformationRule" is not good enough.
>> If
>> you have any better solutions, please share them.
>>
>> Thanks,
>> Jinpeng Wu
>>
>> On Fri, May 28, 2021 at 7:10 PM Vladimir Ozerov <ppoze...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > I have an optimizer that uses top-down VolcanoPlanner and has a
>> > ConverterRule for every LogicalNode. I have a new requirement when one
>> of
>> > the physical rules must emit several physical nodes instead of one. I
>> tried
>> > to convert a ConverterRule to a normal rule that emits physical nodes.
>> Even
>> > though the new rule has exactly the same pattern and logic as the
>> previous
>> > ConverterRule, plans changed. Analysis showed that this likely due to a
>> bug
>> > in the top-down optimizer as explained below.
>> >
>> > When optimizing a logical node, the top-down first schedules the
>> > transformation rules, and then implementation rules. The logic to check
>> > whether the rule is transformation rule or not is located in
>> > VolcanoPlanner.isTransformationRule [1]. The rule scheduling logic
>> ensures
>> > that a given rule is executed either as a transformation rule, or an
>> > implementation rule, but not both. See TopDowRuleQueue.popMatch. The
>> > top-down optimizer schedules tasks in a stack. So even though the
>> > transformation rules are scheduled before implementation rules, the
>> latter
>> > executed first.
>> >
>> > If an implementation rule produces a physical node, this node will be
>> > notified about input traits in the "derive" phase. In contrast,
>> > transformation rules produce logical nodes only, and this happens after
>> the
>> > derivation of the inputs is completed. Therefore, if the top-down
>> optimizer
>> > mistakenly treats an implementation rule as a transformation rule,
>> "derive"
>> > will not be called on the produced physical nodes, leading to incomplete
>> > search space exploration.
>> >
>> > It seems, that this is exactly what happens in the current
>> implementation.
>> > The VolcanoPlanner.isTransformationRule looks like this:
>> >
>> >  1: protected boolean isTransformationRule(VolcanoRuleCall match) {
>> >  2:    if (match.getRule() instanceof SubstitutionRule) {
>> >  3:      return true;
>> >  4:    }
>> >  5:    if (match.getRule() instanceof ConverterRule
>> >  6:        && match.getRule().getOutTrait() == rootConvention) {
>> >  7:      return false;
>> >  8:    }
>> >  9:    return match.getRule().getOperand().trait == Convention.NONE
>> > 10:        || match.getRule().getOperand().trait == null;
>> > 11: }
>> >
>> > If the rule is a ConverterRule and it produces the node with the target
>> > convention, it is treated as an implementation rule (lines 5-6). But if
>> the
>> > rule is not a ConverterRule, the method tries to deduce the rule's type
>> > from the incoming convention (lines 9-10). In practice, implementation
>> > rules either do not care about the incoming trait or expect the NONE
>> trait.
>> > Therefore, it seems that currently, the top-down optimizer treats many
>> > implementation rules as physical rules, and as a result, cannot notify
>> > physical nodes produced from these rules about trait derivation.
>> >
>> > This explains why in my case everything was ok when all implementation
>> > rules were ConverterRules, and why I lost some optimal plans when the
>> rule
>> > was refactored to a non-converter variant.
>> >
>> > Do you agree that this a bug? If yes, shouldn't we refactor that code to
>> > just check whether the rule is an instance of TransformationRule? Since
>> > this is a breaking change, we may add a special flag that preserves the
>> old
>> > behavior by default but allows for the new behavior to overcome the
>> > aforementioned problem.
>> >
>> > Regards,
>> > Vladimir.
>> >
>> > [1]
>> >
>> >
>> https://github.com/apache/calcite/blob/calcite-1.26.0/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L1398-L1408
>> >
>>
>

Re: Top-down optimizer cannot explore the search space because physical rule is treated as transformation rule

Reply via email to