[ 
https://issues.apache.org/jira/browse/CALCITE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CALCITE-7317:
------------------------------------
    Labels: pull-request-available  (was: )

> SubQueryRemoveRule should skip NULL-safety checks when NOT IN needle is NOT 
> NULL
> --------------------------------------------------------------------------------
>
>                 Key: CALCITE-7317
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7317
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.41.0
>            Reporter: Alessandro Solimando
>            Assignee: Alessandro Solimando
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.42.0
>
>
> SubQueryRemoveRule.rewriteIn() currently generates COUNT aggregates for 
> NULL-safety checks in all NOT IN subqueries, regardless of whether the needle 
> (left operand) can be NULL.
> When the needle is guaranteed NOT NULL (e.g., COALESCE(col, default)), these 
> checks are unnecessary. NULL values in the haystack (subquery results) cannot 
> match a NOT NULL needle anyway, so the expensive COUNT( * ) and COUNT(col) 
> aggregates can be skipped.
> Current behavior:
> {noformat}
> SELECT * FROM emp
> WHERE COALESCE(deptno, 0) NOT IN (
> SELECT deptno FROM dept
> ){noformat}
> Generates:
> {noformat}
> LEFT JOIN (
> (SELECT COUNT as c, COUNT(deptno) as ck FROM dept) – unnecessary
> CROSS JOIN (SELECT DISTINCT deptno, true as i FROM dept)
> ) ON condition{noformat}
> Expected behavior:
> {noformat}
> LEFT JOIN (
> SELECT DISTINCT deptno, true as i FROM dept
> ) ON condition 
> {noformat}
> With simple filter: IS NULL( i ) instead of the complex OR(=(c, 0), IS NULL( 
> i )) AND OR(=(c, 0), >=(ck, c))
> Performance impact:
>  - Eliminates double UNION ALL execution (once for COUNT, once for join)
>  - Removes two aggregate computations
>  - Simplifies join predicate
> The code already documents this optimization at 
> [SubQueryRemoveRule.java:598-607|https://github.com/apache/calcite/blob/f2df771d86f52e15fba8f626eaeb8b0b9b944823/core/src/main/java/org/apache/calcite/rel/rules/SubQueryRemoveRule.java#L598-L607]:
> {noformat}
> // If keys are not null we can remove "ct" and simplify to
> //
> // select e.deptno,
> //   case
> //   when dt.i is not null then true
> //   else false
> //   end
> // from emp as e
> // left join (select distinct deptno, true as i from emp) as dt
> //   on e.deptno = dt.deptno{noformat}
> Proposed solution:
> Conditionally generate the COUNT aggregates only when we detect we need null 
> safety.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to