Krisztian Kasa created CALCITE-6737:
---------------------------------------

             Summary: LoptOptimizeJoinRule can not identify selfjoin on unique 
join keys
                 Key: CALCITE-6737
                 URL: https://issues.apache.org/jira/browse/CALCITE-6737
             Project: Calcite
          Issue Type: Bug
          Components: core
            Reporter: Krisztian Kasa


Let's see the following query
{code}
select e.empno from emp e
         inner join dept d on d.deptno = e.deptno
         inner join emp e2 on e.empno = e2.empno;
{code}
The query contains two inner joins: one joining the {{emp}} table twice 
(self-joins) and one joining the {{dept}} table.
Based on the implementation of {{LoptOptimizeJoinRule}} the final join order is 
also affected by self-joins:
https://github.com/apache/calcite/blob/dbf4ff87ae34d7cb9e08037c15bd6fddfcaaef24/core/src/main/java/org/apache/calcite/rel/rules/LoptOptimizeJoinRule.java#L702-L705

So the join order of above query should look like:
{code}
    LogicalJoin(condition=[=($18, $7)], joinType=[inner])
      LogicalJoin(condition=[=($9, $0)], joinType=[inner])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
      LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{code}

However, identifying self-join is broken after CALCITE-4251 so we end up with 
the following join order
{code}
    LogicalJoin(condition=[=($9, $0)], joinType=[inner])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
      LogicalJoin(condition=[=($9, $7)], joinType=[inner])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
        LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{code}

I bumped into this while I was investigating query plan changes caused by 
upgrading Calcite to 1.33 in Hive. (Hive currently uses Calcite 1.25)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to