Ruben Q L created CALCITE-5401: ---------------------------------- Summary: Rule fired by HepPlanner can return Volcano's RelSubset Key: CALCITE-5401 URL: https://issues.apache.org/jira/browse/CALCITE-5401 Project: Calcite Issue Type: Bug Components: core Reporter: Ruben Q L
TLDR; {{CoreRules.AGGREGATE_REMOVE}} is fired by a {{HepPlanner}} but while removing the Aggregate, instead of returning the Aggregate's input, it returns a VolcanoPlanner's RelSubset with the input, which leads to unforeseeable consequences. Details: This seems a strange issue that happens because several factors occur. I first reproduced it on my application with the following query (on TPCH): {code:sql} SELECT c.c_custkey FROM customer c WHERE c_name IN ('271', '272', '273', '274', '275', '276', '342', '343', '344', '345','346', '347', '348', '349', '350', '351', '352', '353', '354', '355', '356', '357', '358', '359', '360') AND EXISTS (SELECT 1 FROM orders o WHERE o.o_custkey = c.c_custkey) {code} But the issue can be reproduced also in Calcite by adding this test into {{{}HepPlannerTest{}}}: {code:java} @Test void testAggregateRemove() { final RelBuilder builder = RelBuilderTest.createBuilder(c -> c.withAggregateUnique(true)); final RelNode root = builder .values(new String[]{"i"}, 1, 2, 3) // important to have values sorted .distinct() .build(); final HepProgram program = new HepProgramBuilder() .addRuleInstance(CoreRules.AGGREGATE_REMOVE) .build(); final HepPlanner planner = new HepPlanner(program); planner.setRoot(root); final RelNode result = planner.findBestExp(); assertThat(result, is(instanceOf(LogicalValues.class))); // fails because result is a RelSubset } {code} The important elements are: firstly our {{RelOptCluster}} has a {{VolcanoPlanner}} as planner (so any {{relNode.getCluster().getPlanner()}} call that we execute will return a {{VolcanoPlanner}} instance). Nevertheless we also apply some rules via a {{{}HepPlanner{}}}. I think this is a quite common strategy in Calcite clients to obtain a better performance: first apply a subset of rules that are always beneficial via a {{{}HepPlanner{}}}, and then apply the "main" set of rules via the cost-based {{{}VolcanoPlanner{}}}. Secondly, we have {{{}AggregateRemoveRule{}}}, which we use in the {{HepPlanner}} phase. This rule contains the following code: {code:java} @Override public void onMatch(RelOptRuleCall call) { final Aggregate aggregate = call.rel(0); final RelNode input = aggregate.getInput(); ... final RelNode newInput = convert(input, aggregate.getTraitSet().simplify()); // <-- *** [1] relBuilder.push(newInput); ... call.getPlanner().prune(aggregate); // <-- *** [2] call.transformTo(relBuilder.build()); } {code} Notice the line [2] which uses {{call.getPlanner()}} to call the {{prune}} method. By using {{call.getPlanner()}} we get the correct planner of the rule that is being fired, in this case a {{{}HepPlanner{}}}, so we end up calling {{{}HepPlanner#prune{}}}, which is fine. However, the line [1] calls the {{RelOptRule#convert}} static method: {code:java} public static RelNode convert(RelNode rel, RelTraitSet toTraits) { RelOptPlanner planner = rel.getCluster().getPlanner(); // <-- *** [3] RelTraitSet outTraits = rel.getTraitSet(); for (int i = 0; i < toTraits.size(); i++) { RelTrait toTrait = toTraits.getTrait(i); if (toTrait != null) outTraits = outTraits.replace(i, toTrait); } if (rel.getTraitSet().matches(outTraits)) return rel; return planner.changeTraits(rel, outTraits); // <-- *** [4] } {code} Notice how in this case, the planner is obtained from the relNode's cluster [3], in our case that would be the {{{}VolcanoPlanner{}}}, which is potentially problematic. Further down, if the relNode matches the {{{}outTraits{}}}, no action is done and the same relNode is returned, no problem here. But, if it does not match them, then {{RelOptPlanner#changeTraits}} will be called, i.e. {{VolcanoPlanner#changeTraits}} [4], and this is where the problem will originate: in our scenario {{VolcanoPlanner#changeTraits}} will return a Volcano's {{{}RelSubset{}}}, which is completely unhandable by the {{HepPlanner}} that triggered the rule, and that leads to the incorrect plan returned by the {{{}HepPlanner{}}}. In this case, what happens with our original query ({{{}LogicalValues{}}} with sorted values), we get to {{RelOptRule#convert}} with the RelNode being a {{LogicalValues}} with {{Convention.NONE}} + {{{}Collation[0]{}}}, and the {{toTraits}} are the ones from the {{LogicalAggregate}} that we are removing: {{Convention.NONE}} + {{Collation[]}} . Since the traits from the {{LogicalValues}} do not match the LogicalAgggregate traits ({{{}Collation[0] != Collation[]{}}}), the {{RelOptPlanner#changeTraits}} is called and the problem occurs. I am not sure why here {{RelTraitSet#matches}} is used (which computes an exact match, hence returning false), rather than {{{}RelTraitSet#satisfies{}}}, which would have returned true, because a sorted {{LogicalValues}} ({{{}Collation[0]{}}}) satisfies the unsorted {{{}Collation[]{}}}, but I assume there is a reason for that. As a workaround, if the {{LogicalValues}} elements are NOT in order, then the problem is avoided: we deal with a {{LogicalValues}} with {{Collation[]}} , so inside {{RelOptRule#convert}} the {{LogicalValues}} traits ({{{}Convention.NONE + Collation[]{}}}) match the {{LogicalAggregates}} ones ({{{}Convention.NONE + Collation[]{}}}), so the method returns without calling {{{}RelOptPlanner#changeTraits{}}}, so the problem does not happen. This can be confirmed by modifying the proposed test: {code:java} @Test void testAggregateRemoveOk() { final RelBuilder builder = RelBuilderTest.createBuilder(c -> c.withAggregateUnique(true)); final RelNode root = builder .values(new String[]{"i"}, 1, 42, 3) // not sorted .distinct() .build(); final HepProgram program = new HepProgramBuilder() .addRuleInstance(CoreRules.AGGREGATE_REMOVE) .build(); final HepPlanner planner = new HepPlanner(program); planner.setRoot(root); final RelNode result = planner.findBestExp(); assertThat(result, is(instanceOf(LogicalValues.class))); // ok } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)