[ https://issues.apache.org/jira/browse/CALCITE-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718551#comment-17718551 ]
Ruben Q L edited comment on CALCITE-5401 at 5/2/23 12:29 PM: ------------------------------------------------------------- Thanks for the feedback [~zabetak]. I agree that using different planners on the same cluster is not a good idea, although it is fairly common. In any case, I think we can easily prevent this issue from happening, even in this situation. Basically the rule's {{onMatch}} would need to call a conversion based on {{call.getPlanner()}}, and not {{relNode.getCluster().getPlanner()}}. Notice that this is already the case on other planner-related methods called inside {{onMatch}}, such as {{prune}} or {{onCopy}}, which are always called via {{call.getPlanner().prune(...)}} & {{call.getPlanner().onCopy(...)}} (and not via {{relNode.getCluster().getPlanner().prune(...)}} / {{relNode.getCluster().getPlanner().onCopy(...)}}). So I think it would make sense to apply the same logic to {{convert}}. I have created a PR with this idea: https://github.com/apache/calcite/pull/3185 was (Author: rubenql): Thanks for the feedback [~zabetak]. I agree that using different planners on the same cluster is not a good idea, although it is fairly common. In any case, I think we can easily prevent this issue from happening, even in this situation. Basically the rule's {{onMatch}} would need to call a conversion based on {{call.getPlanner()}}, and not {{relNode.getCluster().getPlanner()}}. Notice that this is already the case on other planner-related methods called inside {{onMatch}}, such us {{prune}} or {{onCopy}}, which are always called via {{call.getPlanner().prune(...)}} & {{call.getPlanner().onCopy(...)}} (and not via {{relNode.getCluster().getPlanner().prune(...)}} / {{relNode.getCluster().getPlanner().onCopy(...)}}). So I think it would make sense to apply the same logic to {{convert}}. I have created a PR with this idea: https://github.com/apache/calcite/pull/3185 > Rule fired by HepPlanner can return Volcano's RelSubset > ------------------------------------------------------- > > Key: CALCITE-5401 > URL: https://issues.apache.org/jira/browse/CALCITE-5401 > Project: Calcite > Issue Type: Bug > Components: core > Reporter: Ruben Q L > Assignee: Ruben Q L > Priority: Major > Labels: pull-request-available > Fix For: 1.35.0 > > > TLDR; {{CoreRules.AGGREGATE_REMOVE}} is fired by a {{HepPlanner}} but while > removing the Aggregate, instead of returning the Aggregate's input, it > returns a VolcanoPlanner's RelSubset with the input, which leads to > unforeseeable consequences. > > Details: This seems a strange issue that happens because several factors > occur. > I first reproduced it on my application with the following query (on TPCH): > {code:sql} > SELECT c.c_custkey > FROM customer c > WHERE c.c_name IN ('271', '272', '273', '274', '275', '276', '342', '343', > '344', '345','346', '347', '348', '349', '350', '351', '352', '353', '354', > '355', '356', '357', '358', '359', '360') > AND EXISTS (SELECT 1 FROM orders o WHERE o.o_custkey = c.c_custkey) > {code} > But the issue can be reproduced also in Calcite by adding this test into > {{{}HepPlannerTest{}}}: > {code:java} > @Test void testAggregateRemove() { > final RelBuilder builder = RelBuilderTest.createBuilder(c -> > c.withAggregateUnique(true)); > final RelNode root = > builder > .values(new String[]{"i"}, 1, 2, 3) // important to have values > sorted > .distinct() > .build(); > final HepProgram program = new HepProgramBuilder() > .addRuleInstance(CoreRules.AGGREGATE_REMOVE) > .build(); > final HepPlanner planner = new HepPlanner(program); > planner.setRoot(root); > final RelNode result = planner.findBestExp(); > assertThat(result, is(instanceOf(LogicalValues.class))); // fails because > result is a RelSubset > } > {code} > The important elements are: firstly our {{RelOptCluster}} has a > {{VolcanoPlanner}} as planner (so any {{relNode.getCluster().getPlanner()}} > call that we execute will return a {{VolcanoPlanner}} instance). Nevertheless > we also apply some rules via a {{{}HepPlanner{}}}. I think this is a quite > common strategy in Calcite clients to obtain a better performance: first > apply a subset of rules that are always beneficial via a {{{}HepPlanner{}}}, > and then apply the "main" set of rules via the cost-based > {{{}VolcanoPlanner{}}}. > Secondly, we have {{{}AggregateRemoveRule{}}}, which we use in the > {{HepPlanner}} phase. > This rule contains the following code: > {code:java} > @Override public void onMatch(RelOptRuleCall call) { > final Aggregate aggregate = call.rel(0); > final RelNode input = aggregate.getInput(); > ... > final RelNode newInput = convert(input, > aggregate.getTraitSet().simplify()); // <-- *** [1] > relBuilder.push(newInput); > ... > call.getPlanner().prune(aggregate); // <-- *** [2] > call.transformTo(relBuilder.build()); > } > {code} > Notice the line [2] which uses {{call.getPlanner()}} to call the {{prune}} > method. By using {{call.getPlanner()}} we get the correct planner of the rule > that is being fired, in this case a {{{}HepPlanner{}}}, so we end up calling > {{{}HepPlanner#prune{}}}, which is fine. > However, the line [1] calls the {{RelOptRule#convert}} static method: > {code:java} > public static RelNode convert(RelNode rel, RelTraitSet toTraits) { > RelOptPlanner planner = rel.getCluster().getPlanner(); // <-- *** [3] > RelTraitSet outTraits = rel.getTraitSet(); > for (int i = 0; i < toTraits.size(); i++) { > RelTrait toTrait = toTraits.getTrait(i); > if (toTrait != null) > outTraits = outTraits.replace(i, toTrait); > } > if (rel.getTraitSet().matches(outTraits)) > return rel; > return planner.changeTraits(rel, outTraits); // <-- *** [4] > } > {code} > Notice how in this case, the planner is obtained from the relNode's cluster > [3], in our case that would be the {{{}VolcanoPlanner{}}}, which is > potentially problematic. Further down, if the relNode matches the > {{{}outTraits{}}}, no action is done and the same relNode is returned, no > problem here. But, if it does not match them, then > {{RelOptPlanner#changeTraits}} will be called, i.e. > {{VolcanoPlanner#changeTraits}} [4], and this is where the problem will > originate: in our scenario {{VolcanoPlanner#changeTraits}} will return a > Volcano's {{{}RelSubset{}}}, which is completely unhandable by the > {{HepPlanner}} that triggered the rule, and that leads to the incorrect plan > returned by the {{{}HepPlanner{}}}. > In this case, what happens with our original query ({{{}LogicalValues{}}} > with sorted values), we get to {{RelOptRule#convert}} with the RelNode being > a {{LogicalValues}} with {{Convention.NONE}} + {{{}Collation[0]{}}}, and the > {{toTraits}} are the ones from the {{LogicalAggregate}} that we are removing: > {{Convention.NONE}} + {{Collation[]}} . Since the traits from the > {{LogicalValues}} do not match the LogicalAgggregate traits ({{{}Collation[0] > != Collation[]{}}}), the {{RelOptPlanner#changeTraits}} is called and the > problem occurs. I am not sure why here {{RelTraitSet#matches}} is used (which > computes an exact match, hence returning false), rather than > {{{}RelTraitSet#satisfies{}}}, which would have returned true, because a > sorted {{LogicalValues}} ({{{}Collation[0]{}}}) satisfies the unsorted > {{{}Collation[]{}}}, but I assume there is a reason for that. > As a workaround, if the {{LogicalValues}} elements are NOT in order, then the > problem is avoided: we deal with a {{LogicalValues}} with {{Collation[]}} , > so inside {{RelOptRule#convert}} the {{LogicalValues}} traits > ({{{}Convention.NONE + Collation[]{}}}) match the {{LogicalAggregates}} ones > ({{{}Convention.NONE + Collation[]{}}}), so the method returns without > calling {{{}RelOptPlanner#changeTraits{}}}, so the problem does not happen. > This can be confirmed by modifying the proposed test: > {code:java} > @Test void testAggregateRemoveOk() { > final RelBuilder builder = RelBuilderTest.createBuilder(c -> > c.withAggregateUnique(true)); > final RelNode root = > builder > .values(new String[]{"i"}, 1, 42, 3) // not sorted > .distinct() > .build(); > final HepProgram program = new HepProgramBuilder() > .addRuleInstance(CoreRules.AGGREGATE_REMOVE) > .build(); > final HepPlanner planner = new HepPlanner(program); > planner.setRoot(root); > final RelNode result = planner.findBestExp(); > assertThat(result, is(instanceOf(LogicalValues.class))); // ok > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)