Alessandro Solimando created CALCITE-7317:
---------------------------------------------
Summary: SubQueryRemoveRule should skip NULL-safety checks when
NOT IN needle is NOT NULL
Key: CALCITE-7317
URL: https://issues.apache.org/jira/browse/CALCITE-7317
Project: Calcite
Issue Type: Improvement
Components: core
Affects Versions: 1.41.0
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando
Fix For: 1.42.0
SubQueryRemoveRule.rewriteIn() currently generates COUNT aggregates for
NULL-safety checks in all NOT IN subqueries, regardless of whether the needle
(left operand) can be NULL.
When the needle is guaranteed NOT NULL (e.g., COALESCE(col, default)), these
checks are unnecessary. NULL values in the haystack (subquery results) cannot
match a NOT NULL needle anyway, so the expensive COUNT( * ) and COUNT(col)
aggregates can be skipped.
Current behavior:
{noformat}
SELECT * FROM emp
WHERE COALESCE(deptno, 0) NOT IN (
SELECT deptno FROM dept
){noformat}
Generates:
{noformat}
LEFT JOIN (
(SELECT COUNT as c, COUNT(deptno) as ck FROM dept) – unnecessary
CROSS JOIN (SELECT DISTINCT deptno, true as i FROM dept)
) ON condition{noformat}
Expected behavior:
{noformat}
LEFT JOIN (
SELECT DISTINCT deptno, true as i FROM dept
) ON condition
{noformat}
With simple filter: IS NULL( i ) instead of the complex OR(=(c, 0), IS NULL( i
)) AND OR(=(c, 0), >=(ck, c))
Performance impact:
- Eliminates double UNION ALL execution (once for COUNT, once for join)
- Removes two aggregate computations
- Simplifies join predicate
The code already documents this optimization at
[SubQueryRemoveRule.java:598-607|https://github.com/apache/calcite/blob/f2df771d86f52e15fba8f626eaeb8b0b9b944823/core/src/main/java/org/apache/calcite/rel/rules/SubQueryRemoveRule.java#L598-L607]:
{noformat}
// If keys are not null we can remove "ct" and simplify to
//
// select e.deptno,
// case
// when dt.i is not null then true
// else false
// end
// from emp as e
// left join (select distinct deptno, true as i from emp) as dt
// on e.deptno = dt.deptno{noformat}
Proposed solution:
Conditionally generate the COUNT aggregates only when we detect we need null
safety.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)