Krisztian Kasa created CALCITE-6737:
---------------------------------------
Summary: LoptOptimizeJoinRule can not identify selfjoin on unique
join keys
Key: CALCITE-6737
URL: https://issues.apache.org/jira/browse/CALCITE-6737
Project: Calcite
Issue Type: Bug
Components: core
Reporter: Krisztian Kasa
Let's see the following query
{code}
select e.empno from emp e
inner join dept d on d.deptno = e.deptno
inner join emp e2 on e.empno = e2.empno;
{code}
The query contains two inner joins: one joining the {{emp}} table twice
(self-joins) and one joining the {{dept}} table.
Based on the implementation of {{LoptOptimizeJoinRule}} the final join order is
also affected by self-joins:
https://github.com/apache/calcite/blob/dbf4ff87ae34d7cb9e08037c15bd6fddfcaaef24/core/src/main/java/org/apache/calcite/rel/rules/LoptOptimizeJoinRule.java#L702-L705
So the join order of above query should look like:
{code}
LogicalJoin(condition=[=($18, $7)], joinType=[inner])
LogicalJoin(condition=[=($9, $0)], joinType=[inner])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{code}
However, identifying self-join is broken after CALCITE-4251 so we end up with
the following join order
{code}
LogicalJoin(condition=[=($9, $0)], joinType=[inner])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalJoin(condition=[=($9, $7)], joinType=[inner])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{code}
I bumped into this while I was investigating query plan changes caused by
upgrading Calcite to 1.33 in Hive. (Hive currently uses Calcite 1.25)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)