Alessandro Solimando created CALCITE-7203:
---------------------------------------------
Summary: IntersectToSemiJoinRule should compute once and reuse
join keys to avoid duplicates
Key: CALCITE-7203
URL: https://issues.apache.org/jira/browse/CALCITE-7203
Project: Calcite
Issue Type: Improvement
Components: core
Affects Versions: 1.40.0
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando
[IntersectToSemiJoinRule|https://github.com/apache/calcite/blob/9014934d8c24a5242a6840efe20134e820426c24/core/src/main/java/org/apache/calcite/rel/rules/IntersectToSemiJoinRule.java#L119-L128]
repeatedly creates cast expressions between pair of intersect operands, while
we could "pre-compute" these join keys targeting the row type of the n-way
intersect expression, which is the final type that all intersect operands must
conform to.
Computing the join keys pair-wise, the current implementation, introduces
duplicates and noise due to the partial type unification vs the stable type
unification over the final/global row type.
[planner.iq#L150-L179|https://github.com/apache/calcite/blob/9014934d8c24a5242a6840efe20134e820426c24/core/src/test/resources/sql/planner.iq#L150-L179]
could be simplified;
before:
{noformat}
EnumerableCalc(expr#0..1=[{inputs}], expr#2=[CAST($t0):DECIMAL(11, 1)], A=[$t2])
EnumerableHashJoin(condition=[=($1, $3)], joinType=[semi])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
proj#0..1=[{exprs}])
EnumerableAggregate(group=[{0}])
EnumerableHashJoin(condition=[=($1, $3)], joinType=[semi])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1], A0=[$t1])
EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, { 5.0
}]])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1], A0=[$t1])
EnumerableValues(tuples=[[{ 1 }, { 2 }]])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
A=[$t1], A0=[$t1]) <= extra A0
EnumerableValues(tuples=[[{ 1.0 }, { 4.0 }, { null }]]){noformat}
after:
{noformat}
EnumerableAggregate(group=[{0}])
EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
joinType=[semi])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
A=[$t1])
EnumerableAggregate(group=[{0}])
EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
joinType=[semi])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1])
EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, { 5.0
}]])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1]) <= no more A0
EnumerableValues(tuples=[[{ 1 }, { 2 }]])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
A=[$t1])
EnumerableValues(tuples=[[{ 1.0 }, { 4.0 }, { null }]]){noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)