[
https://issues.apache.org/jira/browse/CALCITE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alessandro Solimando updated CALCITE-7317:
------------------------------------------
Summary: SubQueryRemoveRule should skip NULL-safety checks when both the
keys and the IN subquery are NOT NULL (was: SubQueryRemoveRule should skip
NULL-safety checks when NOT IN needle is NOT NULL)
> SubQueryRemoveRule should skip NULL-safety checks when both the keys and the
> IN subquery are NOT NULL
> -----------------------------------------------------------------------------------------------------
>
> Key: CALCITE-7317
> URL: https://issues.apache.org/jira/browse/CALCITE-7317
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.41.0
> Reporter: Alessandro Solimando
> Assignee: Alessandro Solimando
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> SubQueryRemoveRule.rewriteIn() currently generates COUNT aggregates for
> NULL-safety checks in all NOT IN subqueries, regardless of whether the needle
> (left operand) can be NULL.
> When the needle is guaranteed NOT NULL (e.g., COALESCE(col, default)), these
> checks are unnecessary. NULL values in the haystack (subquery results) cannot
> match a NOT NULL needle anyway, so the expensive COUNT( * ) and COUNT(col)
> aggregates can be skipped.
> Current behavior:
> {noformat}
> SELECT * FROM emp
> WHERE COALESCE(deptno, 0) NOT IN (
> SELECT deptno FROM dept
> ){noformat}
> Generates:
> {noformat}
> LEFT JOIN (
> (SELECT COUNT as c, COUNT(deptno) as ck FROM dept) – unnecessary
> CROSS JOIN (SELECT DISTINCT deptno, true as i FROM dept)
> ) ON condition{noformat}
> Expected behavior:
> {noformat}
> LEFT JOIN (
> SELECT DISTINCT deptno, true as i FROM dept
> ) ON condition
> {noformat}
> With simple filter: IS NULL( i ) instead of the complex OR(=(c, 0), IS NULL(
> i )) AND OR(=(c, 0), >=(ck, c))
> Performance impact:
> - Eliminates double UNION ALL execution (once for COUNT, once for join)
> - Removes two aggregate computations
> - Simplifies join predicate
> The code already documents this optimization at
> [SubQueryRemoveRule.java:598-607|https://github.com/apache/calcite/blob/f2df771d86f52e15fba8f626eaeb8b0b9b944823/core/src/main/java/org/apache/calcite/rel/rules/SubQueryRemoveRule.java#L598-L607]:
> {noformat}
> // If keys are not null we can remove "ct" and simplify to
> //
> // select e.deptno,
> // case
> // when dt.i is not null then true
> // else false
> // end
> // from emp as e
> // left join (select distinct deptno, true as i from emp) as dt
> // on e.deptno = dt.deptno{noformat}
> Proposed solution:
> Conditionally generate the COUNT aggregates only when we detect we need null
> safety.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)