Zhen Chen created CALCITE-6930:
----------------------------------
Summary: Implementing JoinConditionOrExpansionRule
Key: CALCITE-6930
URL: https://issues.apache.org/jira/browse/CALCITE-6930
Project: Calcite
Issue Type: New Feature
Reporter: Zhen Chen
Assignee: Zhen Chen
JoinConditionOrExpansionRule transforms a join with OR conditions into a UNION
ALL of multiple joins.
For example, the SQL:
SELECT * FROM emp JOIN dept
ON emp.deptno = dept.deptno OR emp.deptno = dept.mgr
Will be transformed into:
SELECT * FROM emp JOIN dept ON emp.deptno = dept.deptno
UNION ALL
SELECT * FROM emp JOIN dept ON emp.deptno = dept.mgr
The original plan:
LogicalJoin(condition=[OR(=(deptno, deptno), =(deptno, mgr))])
LogicalTableScan(table=emp)
LogicalTableScan(table=dept)
Is transformed to:
LogicalUnion(all=[true])
LogicalJoin(condition=[=(deptno, deptno)])
LogicalTableScan(table=emp)
LogicalTableScan(table=dept)
LogicalJoin(condition=[=(deptno, mgr)])
LogicalTableScan(table=emp)
LogicalTableScan(table=dept)
Benefits:
# Each individual join can use hash-join algorithm, which is more efficient
than nested-loop joins required for OR conditions.
# Each join branch can independently choose its optimal join implementation.
# The joins can be executed in parallel.
# Individual joins may be able to use indexes that would not be usable with OR
conditions.
The rule only fires when:
# The OR condition contains only simple equi-join conditions (comparing
columns between left and right inputs).
Limitations and considerations:
# May not improve performance if individual joins produce large results that
need to be union-ed.
# Could be inefficient if the OR condition is very selective but individual
conditions are not.
# Multiple hash tables may cause memory pressure.
# The optimizer should consider statistics and cost estimates when applying
this rule.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)