[ 
https://issues.apache.org/jira/browse/CALCITE-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638725#comment-17638725
 ] 

Ruben Q L commented on CALCITE-5401:
------------------------------------

Not sure how to best solve this issue. {{RelOptRule}}'s {{convert}} methods:
- {{public static RelNode convert(RelNode rel, RelTraitSet toTraits)}}
- {{public static RelNode convert(RelNode rel, RelTrait toTrait}}
- {{protected static List<RelNode> convertList(List<RelNode> rels, final 
RelTrait trait)}}

Are used by many rules, so theoretically any of these rules being triggered by 
a planner different from the cluster's planner could face this problem.
A possible solution could be passing a RelOptPlanner parameter to these convert 
methods (and maybe deprecate the old ones).
Or perhaps we could move this convert logic from RelOptRule (where they are 
static helper methods, so not really attached to the rule's logic) into a 
default implementation of RelOptPlanner 

> Rule fired by HepPlanner can return Volcano's RelSubset
> -------------------------------------------------------
>
>                 Key: CALCITE-5401
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5401
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Ruben Q L
>            Priority: Major
>
> TLDR; {{CoreRules.AGGREGATE_REMOVE}} is fired by a {{HepPlanner}} but while 
> removing the Aggregate, instead of returning the Aggregate's input, it 
> returns a VolcanoPlanner's RelSubset with the input, which leads to 
> unforeseeable consequences.
>  
> Details: This seems a strange issue that happens because several factors 
> occur.
> I first reproduced it on my application with the following query (on TPCH):
> {code:sql}
> SELECT c.c_custkey
> FROM customer c
> WHERE c.c_name IN ('271', '272', '273', '274', '275', '276', '342', '343', 
> '344', '345','346', '347', '348', '349', '350', '351',  '352', '353', '354',  
> '355', '356', '357', '358', '359', '360')
> AND EXISTS (SELECT 1 FROM orders o WHERE o.o_custkey = c.c_custkey)
> {code}
> But the issue can be reproduced also in Calcite by adding this test into 
> {{{}HepPlannerTest{}}}:
> {code:java}
>   @Test void testAggregateRemove() {
>     final RelBuilder builder = RelBuilderTest.createBuilder(c -> 
> c.withAggregateUnique(true));
>     final RelNode root =
>         builder
>             .values(new String[]{"i"}, 1, 2, 3) // important to have values 
> sorted
>             .distinct()
>             .build();
>     final HepProgram program = new HepProgramBuilder()
>         .addRuleInstance(CoreRules.AGGREGATE_REMOVE)
>         .build();
>     final HepPlanner planner = new HepPlanner(program);
>     planner.setRoot(root);
>     final RelNode result = planner.findBestExp();
>     assertThat(result, is(instanceOf(LogicalValues.class))); // fails because 
> result is a RelSubset
>   }
> {code}
> The important elements are: firstly our {{RelOptCluster}} has a 
> {{VolcanoPlanner}} as planner (so any {{relNode.getCluster().getPlanner()}} 
> call that we execute will return a {{VolcanoPlanner}} instance). Nevertheless 
> we also apply some rules via a {{{}HepPlanner{}}}. I think this is a quite 
> common strategy in Calcite clients to obtain a better performance: first 
> apply a subset of rules that are always beneficial via a {{{}HepPlanner{}}}, 
> and then apply the "main" set of rules via the cost-based 
> {{{}VolcanoPlanner{}}}.
> Secondly, we have {{{}AggregateRemoveRule{}}}, which we use in the 
> {{HepPlanner}} phase.
> This rule contains the following code:
> {code:java}
>   @Override public void onMatch(RelOptRuleCall call) {
>     final Aggregate aggregate = call.rel(0);
>     final RelNode input = aggregate.getInput();
>     ...
>     final RelNode newInput = convert(input, 
> aggregate.getTraitSet().simplify()); // <-- *** [1]
>     relBuilder.push(newInput);
>     ...
>     call.getPlanner().prune(aggregate);  // <-- *** [2]
>     call.transformTo(relBuilder.build());
>   }
> {code}
> Notice the line [2] which uses {{call.getPlanner()}} to call the {{prune}} 
> method. By using {{call.getPlanner()}} we get the correct planner of the rule 
> that is being fired, in this case a {{{}HepPlanner{}}}, so we end up calling 
> {{{}HepPlanner#prune{}}}, which is fine.
> However, the line [1] calls the {{RelOptRule#convert}} static method:
> {code:java}
>   public static RelNode convert(RelNode rel, RelTraitSet toTraits) {
>     RelOptPlanner planner = rel.getCluster().getPlanner(); // <-- *** [3]
>     RelTraitSet outTraits = rel.getTraitSet();
>     for (int i = 0; i < toTraits.size(); i++) {
>       RelTrait toTrait = toTraits.getTrait(i);
>       if (toTrait != null)
>         outTraits = outTraits.replace(i, toTrait);
>     }
>     if (rel.getTraitSet().matches(outTraits))
>       return rel;
>     return planner.changeTraits(rel, outTraits); // <-- *** [4]
>   }
> {code}
> Notice how in this case, the planner is obtained from the relNode's cluster 
> [3], in our case that would be the {{{}VolcanoPlanner{}}}, which is 
> potentially problematic. Further down, if the relNode matches the 
> {{{}outTraits{}}}, no action is done and the same relNode is returned, no 
> problem here. But, if it does not match them, then 
> {{RelOptPlanner#changeTraits}} will be called, i.e. 
> {{VolcanoPlanner#changeTraits}} [4], and this is where the problem will 
> originate: in our scenario {{VolcanoPlanner#changeTraits}} will return a 
> Volcano's {{{}RelSubset{}}}, which is completely unhandable by the 
> {{HepPlanner}} that triggered the rule, and that leads to the incorrect plan 
> returned by the {{{}HepPlanner{}}}.
> In this case, what happens with our original query ({{{}LogicalValues{}}} 
> with sorted values), we get to {{RelOptRule#convert}} with the RelNode being 
> a {{LogicalValues}} with {{Convention.NONE}} + {{{}Collation[0]{}}}, and the 
> {{toTraits}} are the ones from the {{LogicalAggregate}} that we are removing: 
> {{Convention.NONE}} + {{Collation[]}} . Since the traits from the 
> {{LogicalValues}} do not match the LogicalAgggregate traits ({{{}Collation[0] 
> != Collation[]{}}}), the {{RelOptPlanner#changeTraits}} is called and the 
> problem occurs. I am not sure why here {{RelTraitSet#matches}} is used (which 
> computes an exact match, hence returning false), rather than 
> {{{}RelTraitSet#satisfies{}}}, which would have returned true, because a 
> sorted {{LogicalValues}} ({{{}Collation[0]{}}}) satisfies the unsorted 
> {{{}Collation[]{}}}, but I assume there is a reason for that.
> As a workaround, if the {{LogicalValues}} elements are NOT in order, then the 
> problem is avoided: we deal with a {{LogicalValues}} with {{Collation[]}} , 
> so inside {{RelOptRule#convert}} the {{LogicalValues}} traits 
> ({{{}Convention.NONE + Collation[]{}}}) match the {{LogicalAggregates}} ones 
> ({{{}Convention.NONE + Collation[]{}}}), so the method returns without 
> calling {{{}RelOptPlanner#changeTraits{}}}, so the problem does not happen. 
> This can be confirmed by modifying the proposed test:
> {code:java}
>   @Test void testAggregateRemoveOk() {
>     final RelBuilder builder = RelBuilderTest.createBuilder(c -> 
> c.withAggregateUnique(true));
>     final RelNode root =
>         builder
>             .values(new String[]{"i"}, 1, 42, 3) // not sorted
>             .distinct()
>             .build();
>     final HepProgram program = new HepProgramBuilder()
>         .addRuleInstance(CoreRules.AGGREGATE_REMOVE)
>         .build();
>     final HepPlanner planner = new HepPlanner(program);
>     planner.setRoot(root);
>     final RelNode result = planner.findBestExp();
>     assertThat(result, is(instanceOf(LogicalValues.class))); // ok
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to