xiong duan created CALCITE-7371:
-----------------------------------
Summary: Expand Correlate to Support Condition and VariablesSet
Key: CALCITE-7371
URL: https://issues.apache.org/jira/browse/CALCITE-7371
Project: Calcite
Issue Type: Bug
Reporter: xiong duan
Currently, in a Join, if there is a reference to LHS columns in the RHS, it is
directly converted to a Correlate.
For example:
{code:java}
SELECT
d.deptno,
e.ename
FROM dept d
LEFT JOIN LATERAL (
SELECT
ename
FROM emp
WHERE emp.deptno = d.deptno
AND emp.job = 'MANAGER'
) e {code}
If there is a Correlated sub-query in the Join's Condition, it is converted to
a Join with variablesSet. For example:
{code:java}
SELECT empno
FROM emp AS e
LEFT JOIN dept AS d
ON d.dname = e.ename
AND (EXISTS (
SELECT e2.deptno FROM emp AS e2
WHERE e2.deptno = e.deptno
GROUP BY e2.deptno
HAVING SUM(e2.sal) > 1000000)); {code}
I understand it as such, and the current code is implemented accordingly.
However, in the comments of the Join creation method, the definition of
variablesSet is consistent with the meaning of CorrelationId in Correlate (Set
variables that are set by the LHS and used by the RHS and are not available to
nodes above this Join in the tree). So I think it is a bug in code comment.
But when a query has both references to LHS columns in the RHS and additional
Correlated sub-queries in the condition, the current Calcite does not support
it.
For example:
{code:java}
SELECT
d.deptno
FROM dept d
LEFT JOIN LATERAL (
SELECT
ename
FROM emp
WHERE emp.deptno = d.deptno
AND emp.job = 'MANAGER'
) e ON EXISTS(SELECT * FROM bonus WHERE d.dname = e.ename) {code}
After this PR, the corresponding logical plan can be represented as:
{code:java}
LogicalProject(DEPTNO=[$0], ENAME=[$3], SAL=[$4], DEPTNO0=[$5], JOB=[$6])
LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{0}],
variablesSet=[$cor1], condition=[EXISTS({
LogicalFilter(condition=[=($cor1.DNAME, CAST($cor1.ENAME):VARCHAR(14))])
JdbcTableScan(table=[[JDBC_SCOTT, BONUS]])
})])
JdbcTableScan(table=[[JDBC_SCOTT, DEPT]])
LogicalProject(ENAME=[$1], SAL=[$5], DEPTNO=[$7], JOB=[$2])
LogicalFilter(condition=[AND(=($7, $cor0.DEPTNO), =($2, 'MANAGER'))])
JdbcTableScan(table=[[JDBC_SCOTT, EMP]]) {code}
Additionally, the extended Correlate can handle references in the condition
well during the RelToSql process. First, construct the LeftResult, and populate
correlation variables based on the Context in LeftResult (these variables may
be referenced in RightResult; otherwise, an exception indicating the variable
name does not exist will be thrown). Then, construct the RightResult, populate
variablesSet variables based on the Context in RightResult and LeftResult, and
then construct the complete Correlate condition, thereby constructing the
Correlate.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)