[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837476#comment-16837476
 ] 

Stamatis Zampetakis commented on CALCITE-2973:
----------------------------------------------

Today I was looking again into another case (CALCITE-2898) where there is a 
need to perform a hash join with a join condition that is not strictly an 
equijoin.

*Example from CALCITE-2898*
{code:sql}
SELECT e.name
FROM emp e
INNER JOIN department d 
  ON e.address.zipcode = d.zipcode
{code}
As you can observe the condition incorporates RexFieldAccess so it does not 
satisfy the requirement RexInputRef = RexInputRef. I have the impression that 
the PR in this issue does not handle this case. We could transform this theta 
join to an equijoin by introducing an additional projection below the join.

*Example from CALCITE-2973*
{code:sql}
SELECT e.ename, d.name
FROM emp e
LEFT JOIN dept d
  ON e.deptno = d.deptno  AND e.sal < 10000
{code}
The query above has the following logical plan.
{noformat}
LogicalProject(ENAME=[$1], NAME=[$11])
  LogicalJoin(condition=[AND(=($7, $10), $9)], joinType=[left])
    LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f9=[<($5, 10000)])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{noformat}
I think it is equivalent to the plans below.
{noformat}
LogicalProject(ENAME=[$1], NAME=[$11])
  LogicalProject($0..$9, EX$10=[CASE($9,$10,null)], EX$11=[CASE($9,$11,null), 
...]])
    LogicalJoin(condition=[=($7, $10)], joinType=[left])
      LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f9=[<($5, 10000)])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
      LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{noformat}
(after merging the projections)
{noformat}
LogicalProject(ENAME=[$1], NAME=[CASE($9,$11,null)])
  LogicalJoin(condition=[=($7, $10)], joinType=[left])
    LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f9=[<($5, 10000)])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{noformat}
Observe that both cases mentioned above can be solved by adding projections 
above/below the join operator without touching at all the join it self. 
Due to this and given that most join algorithms in the literature cannot handle 
theta-joins, I ended up again to the question below:

Should we modify the implementation of our joins algorithms or rather try to 
intro introduce new rule(s) (e.g., ThetaJoinToEquiJoinRule) which can perform 
transformations like those demonstrated above?

I was thinking that the rule based approach can be useful for a greater 
audience so I would like again your input on this (in particular from [~hyuan] 
who seemed to be rather against this approach).

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --------------------------------------------------------------------------------------
>
>                 Key: CALCITE-2973
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2973
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.19.0
>            Reporter: Lai Zhou
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.20.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 10000*10000), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to