Dmitry Sysolyatin created CALCITE-7127:
------------------------------------------
Summary: Anti-join with WHERE NOT EXISTS syntax has corrupted
condition
Key: CALCITE-7127
URL: https://issues.apache.org/jira/browse/CALCITE-7127
Project: Calcite
Issue Type: Bug
Components: core
Affects Versions: 1.40.0
Reporter: Dmitry Sysolyatin
Assignee: Dmitry Sysolyatin
Fix For: 1.41.0
This issue is similar to [CALCITE-6804]; however, that ticket only partially
resolved the underlying problem.
Consider the following query:
{code:java}
SELECT e3."product_id", e3."product_name"
FROM (
SELECT 1 AS "additional_column",
e1."product_id",
e1."product_name"
FROM "foodmart"."product" e1
LEFT JOIN "foodmart"."product" e2
ON e1."product_id" = e2."product_id"
) AS e3
WHERE e3."product_name" IS NOT NULL AND NOT EXISTS (
SELECT 1
FROM "foodmart"."employee" e4
WHERE e4."employee_id" = e3."additional_column"
) {code}
Calcite incorrectly converts it into the following SQL:
{code:java}
SELECT product_id, product_name
FROM (
SELECT
1 AS additional_column,
product.product_id,
product.product_name
FROM foodmart.product
LEFT JOIN foodmart.product AS product0
ON product.product_id = product0.product_id
) AS t
WHERE t.product_name IS NOT NULL AND NOT EXISTS (
SELECT *
FROM foodmart.employee
WHERE employee_id = product.product_class_id
); {code}
The problem is in the NOT EXISTS subquery's WHERE clause.
In the original query, the correlation condition is:
{code:java}
WHERE e4."employee_id" = e3."additional_column"{code}
In the generated query, this becomes:
{code:java}
WHERE employee_id = product.product_class_id {code}
Calcite incorrectly generates a reference to product.product_class_id from the
base table instead of t.additional_column from the derived table.
The generated logical plan is correct:
{code:java}
LogicalProject(product_id=[$1], product_name=[$2])
LogicalFilter(condition=[AND(IS NOT NULL($2), NOT(EXISTS({
LogicalFilter(condition=[=($0, $cor0.additional_column)])
JdbcTableScan(table=[[foodmart, employee]])
})))], variablesSet=[[$cor0]])
LogicalProject(additional_column=[1], product_id=[$1], product_name=[$3])
LogicalJoin(condition=[=($1, $16)], joinType=[left])
JdbcTableScan(table=[[foodmart, product]])
JdbcTableScan(table=[[foodmart, product]]) {code}
The root cause appears to be that the `SqlImplementor.Result` for a `Project`
exposes the aliases of the relations used within that `Project`. Standard SQL
syntax does not allow an outer query to reference these internal relations.
The Project operation should encapsulate the relations it is built upon. In
this case, the `Filter` should not have access to the relations inside the
`Project` it consumes. However, the RelToSqlConverter seems to permit this, by
transfering aliases from `Join` to `Project` causing it to resolve the
correlated variable incorrectly. This appears to be where the bug originates.
So solution would be to removing workaround which was introduced in
CALCITE-6804
And replace `return builder.result();` to `return result(builder.select,
builder.clauses, e, null);` in
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/rel2sql/RelToSqlConverter.java#L562
--
This message was sent by Atlassian Jira
(v8.20.10#820010)