BTW, I tried to change the implementation to: 1: protected boolean isTransformationRule(VolcanoRuleCall match) { 2: return match.getRule() instanceof TransformationRule; 3: }
It solved my problem - plans returned to normal. In the Apache Calcite repo, only 4 tests in the TopDowOptTest class failed due to a minor operator reordering. пт, 28 мая 2021 г. в 20:37, Vladimir Ozerov <ppoze...@gmail.com>: > Hi Jinpeng, > > Thank you for the clarification. When I saw the code in question for the > first time, my first thought was that it was perhaps designed for gradual > migration. The main problem is that the current implementation discards > parts of the plan *silently*, which might be difficult to spot. I > only spotted the problem in my specific case because I had ~100 tests with > complex queries. Otherwise, I would happily proceed with the new rule > without knowing that I lost important parts of the search space. > > That said, I think we can do the following: > > 1. Emit a warning if or even throw an exception if the transformation > rule produced a physical node. This should be trivial to implement - add an > expected node type to VolcanoRuleCall (e.g., "logical", "physical", "any"). > The warning/exception should contain a proper fix suggestion - to override > the VolcanoPlanner.isTransformationRule. > 2. Alternatively - do a breaking change. Apache Calcite doesn't have a > major release cadence. It is normal practice in many products to do > breaking changes in minor releases. Even popular products like Mongo or > DataStax do it regularly. We may inform the user in the first release and > change to "rule instanceof TransformationRule" in the next release. > > Does it make sense? > > Regards, > Vladimir. > > пт, 28 мая 2021 г. в 19:33, Jinpeng Wu <wjpabc...@gmail.com>: > >> Hi, Vladimir. Good catch! There could be some improvements here. >> >> Actually, this problem was discovered early when the top-down rule driver >> was designed. At that time, no rule was annotated as "TransformationRule". >> Moreover, it is impossible to ask every calcite user who designed their >> own >> rules to annotate the existing code. So the top-down rule driver was >> designed so that it can: >> 1. Work in chaos: even if there are no hints for rule types, it can still >> work. Some opportunities may be lost, but NO failures, NO exceptions, and >> NO worse than the original driver. People can migrate to the new driver >> without concern. >> 2. Be Improvable: Users can refactor their code, if they want, step by >> step. As rule types become more and more accurate, the system achieves >> more >> and more benefits >> 3. Be easy to customize: the default implementation is designed for the >> most common cases, so that most users can benefit from it without much >> effort. But it is not possible to fulfill all requirements as different >> systems could have very different patterns to define logical and >> physical. That's why the isTransformationRule method is put in >> VolcanoPlanner and marked as protected: overwriting it can be very simple. >> >> Moreover, losing some "derive" opportunities is not as serious as >> imagination. As I mentioned in previous discussions, parents are in charge >> of raising as many requirements as possible. During "derive", if specific >> traits were not built by children, it means that no parents were requiring >> that. And if parents finally require that traits in the latter >> optimization, passThrough methods get called and new physical nodes are >> generated and "derive" get called again. >> I tested it on millions of queries, with or without correct rule types, in >> my own product. The performance of group pruning varies a lot. But the >> output plans are almost the same. Only one obvious exception was >> discovered: the spool node. That's because spool nodes cannot "passThough" >> parent traits (it could have multiple parents and current framework cannot >> handle such a situation) while it can "derive" input traits. >> >> Of course, this conclusion may not apply to your product as we could have >> quite different rule sets. I am just sharing some of my experiences. Maybe >> the current implementation of "isTransformationRule" is not good enough. >> If >> you have any better solutions, please share them. >> >> Thanks, >> Jinpeng Wu >> >> On Fri, May 28, 2021 at 7:10 PM Vladimir Ozerov <ppoze...@gmail.com> >> wrote: >> >> > Hi, >> > >> > I have an optimizer that uses top-down VolcanoPlanner and has a >> > ConverterRule for every LogicalNode. I have a new requirement when one >> of >> > the physical rules must emit several physical nodes instead of one. I >> tried >> > to convert a ConverterRule to a normal rule that emits physical nodes. >> Even >> > though the new rule has exactly the same pattern and logic as the >> previous >> > ConverterRule, plans changed. Analysis showed that this likely due to a >> bug >> > in the top-down optimizer as explained below. >> > >> > When optimizing a logical node, the top-down first schedules the >> > transformation rules, and then implementation rules. The logic to check >> > whether the rule is transformation rule or not is located in >> > VolcanoPlanner.isTransformationRule [1]. The rule scheduling logic >> ensures >> > that a given rule is executed either as a transformation rule, or an >> > implementation rule, but not both. See TopDowRuleQueue.popMatch. The >> > top-down optimizer schedules tasks in a stack. So even though the >> > transformation rules are scheduled before implementation rules, the >> latter >> > executed first. >> > >> > If an implementation rule produces a physical node, this node will be >> > notified about input traits in the "derive" phase. In contrast, >> > transformation rules produce logical nodes only, and this happens after >> the >> > derivation of the inputs is completed. Therefore, if the top-down >> optimizer >> > mistakenly treats an implementation rule as a transformation rule, >> "derive" >> > will not be called on the produced physical nodes, leading to incomplete >> > search space exploration. >> > >> > It seems, that this is exactly what happens in the current >> implementation. >> > The VolcanoPlanner.isTransformationRule looks like this: >> > >> > 1: protected boolean isTransformationRule(VolcanoRuleCall match) { >> > 2: if (match.getRule() instanceof SubstitutionRule) { >> > 3: return true; >> > 4: } >> > 5: if (match.getRule() instanceof ConverterRule >> > 6: && match.getRule().getOutTrait() == rootConvention) { >> > 7: return false; >> > 8: } >> > 9: return match.getRule().getOperand().trait == Convention.NONE >> > 10: || match.getRule().getOperand().trait == null; >> > 11: } >> > >> > If the rule is a ConverterRule and it produces the node with the target >> > convention, it is treated as an implementation rule (lines 5-6). But if >> the >> > rule is not a ConverterRule, the method tries to deduce the rule's type >> > from the incoming convention (lines 9-10). In practice, implementation >> > rules either do not care about the incoming trait or expect the NONE >> trait. >> > Therefore, it seems that currently, the top-down optimizer treats many >> > implementation rules as physical rules, and as a result, cannot notify >> > physical nodes produced from these rules about trait derivation. >> > >> > This explains why in my case everything was ok when all implementation >> > rules were ConverterRules, and why I lost some optimal plans when the >> rule >> > was refactored to a non-converter variant. >> > >> > Do you agree that this a bug? If yes, shouldn't we refactor that code to >> > just check whether the rule is an instance of TransformationRule? Since >> > this is a breaking change, we may add a special flag that preserves the >> old >> > behavior by default but allows for the new behavior to overcome the >> > aforementioned problem. >> > >> > Regards, >> > Vladimir. >> > >> > [1] >> > >> > >> https://github.com/apache/calcite/blob/calcite-1.26.0/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L1398-L1408 >> > >> >