xiong duan created CALCITE-7371:
-----------------------------------

             Summary: Expand Correlate to Support Condition and VariablesSet
                 Key: CALCITE-7371
                 URL: https://issues.apache.org/jira/browse/CALCITE-7371
             Project: Calcite
          Issue Type: Bug
            Reporter: xiong duan


Currently, in a Join, if there is a reference to LHS columns in the RHS, it is 
directly converted to a Correlate.
For example:
{code:java}
SELECT
    d.deptno,
    e.ename
FROM dept d
LEFT JOIN LATERAL (
    SELECT
        ename
    FROM emp
    WHERE emp.deptno = d.deptno
      AND emp.job = 'MANAGER'
 ) e {code}
If there is a Correlated sub-query in the Join's Condition, it is converted to 
a Join with variablesSet. For example:
{code:java}
SELECT empno
      FROM emp AS e
      LEFT JOIN dept AS d
        ON d.dname = e.ename
          AND (EXISTS (
            SELECT e2.deptno FROM emp AS e2
            WHERE e2.deptno = e.deptno
            GROUP BY e2.deptno
            HAVING SUM(e2.sal) > 1000000)); {code}
I understand it as such, and the current code is implemented accordingly. 
However, in the comments of the Join creation method, the definition of 
variablesSet is consistent with the meaning of CorrelationId in Correlate (Set 
variables that are set by the LHS and used by the RHS and are not available to 
nodes above this Join in the tree). So I think it is a bug in code comment.

But when a query has both references to LHS columns in the RHS and additional 
Correlated sub-queries in the condition, the current Calcite does not support 
it.
For example:
{code:java}
SELECT
    d.deptno
FROM dept d
LEFT JOIN LATERAL (
    SELECT
        ename
    FROM emp
    WHERE emp.deptno = d.deptno
      AND emp.job = 'MANAGER'
 ) e ON EXISTS(SELECT * FROM bonus WHERE d.dname = e.ename) {code}
After this PR, the corresponding logical plan can be represented as:
{code:java}
LogicalProject(DEPTNO=[$0], ENAME=[$3], SAL=[$4], DEPTNO0=[$5], JOB=[$6])
  LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{0}], 
variablesSet=[$cor1], condition=[EXISTS({
LogicalFilter(condition=[=($cor1.DNAME, CAST($cor1.ENAME):VARCHAR(14))])
  JdbcTableScan(table=[[JDBC_SCOTT, BONUS]])
})])
    JdbcTableScan(table=[[JDBC_SCOTT, DEPT]])
    LogicalProject(ENAME=[$1], SAL=[$5], DEPTNO=[$7], JOB=[$2])
      LogicalFilter(condition=[AND(=($7, $cor0.DEPTNO), =($2, 'MANAGER'))])
        JdbcTableScan(table=[[JDBC_SCOTT, EMP]]) {code}
Additionally, the extended Correlate can handle references in the condition 
well during the RelToSql process. First, construct the LeftResult, and populate 
correlation variables based on the Context in LeftResult (these variables may 
be referenced in RightResult; otherwise, an exception indicating the variable 
name does not exist will be thrown). Then, construct the RightResult, populate 
variablesSet variables based on the Context in RightResult and LeftResult, and 
then construct the complete Correlate condition, thereby constructing the 
Correlate.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to