Alessandro Solimando created CALCITE-7203:
---------------------------------------------

             Summary: IntersectToSemiJoinRule should compute once and reuse 
join keys to avoid duplicates
                 Key: CALCITE-7203
                 URL: https://issues.apache.org/jira/browse/CALCITE-7203
             Project: Calcite
          Issue Type: Improvement
          Components: core
    Affects Versions: 1.40.0
            Reporter: Alessandro Solimando
            Assignee: Alessandro Solimando


[IntersectToSemiJoinRule|https://github.com/apache/calcite/blob/9014934d8c24a5242a6840efe20134e820426c24/core/src/main/java/org/apache/calcite/rel/rules/IntersectToSemiJoinRule.java#L119-L128]
 repeatedly creates cast expressions between pair of intersect operands, while 
we could "pre-compute" these join keys targeting the row type of the n-way 
intersect expression, which is the final type that all intersect operands must 
conform to.

Computing the join keys pair-wise, the current implementation, introduces 
duplicates and noise due to the partial type unification vs the stable type 
unification over the final/global row type.

[planner.iq#L150-L179|https://github.com/apache/calcite/blob/9014934d8c24a5242a6840efe20134e820426c24/core/src/test/resources/sql/planner.iq#L150-L179]
 could be simplified;

before:
{noformat}
EnumerableCalc(expr#0..1=[{inputs}], expr#2=[CAST($t0):DECIMAL(11, 1)], A=[$t2])
  EnumerableHashJoin(condition=[=($1, $3)], joinType=[semi])
    EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)], 
proj#0..1=[{exprs}])
      EnumerableAggregate(group=[{0}])
        EnumerableHashJoin(condition=[=($1, $3)], joinType=[semi])
          EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) 
NOT NULL], A=[$t1], A0=[$t1])
            EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, { 5.0 
}]])
          EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) 
NOT NULL], A=[$t1], A0=[$t1])
            EnumerableValues(tuples=[[{ 1 }, { 2 }]])
    EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)], 
A=[$t1], A0=[$t1]) <= extra A0
      EnumerableValues(tuples=[[{ 1.0 }, { 4.0 }, { null }]]){noformat}
after:
{noformat}
EnumerableAggregate(group=[{0}])
  EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)], 
joinType=[semi])
    EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)], 
A=[$t1])
      EnumerableAggregate(group=[{0}])
        EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)], 
joinType=[semi])
          EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) 
NOT NULL], A=[$t1])
            EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, { 5.0 
}]])
          EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) 
NOT NULL], A=[$t1]) <= no more A0
            EnumerableValues(tuples=[[{ 1 }, { 2 }]])
    EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)], 
A=[$t1])
      EnumerableValues(tuples=[[{ 1.0 }, { 4.0 }, { null }]]){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to