[jira] [Updated] (CALCITE-7330) AggregateCaseToFilterRule should not be applied on aggregate functions that don't ignore NULLs

Alessandro Solimando (Jira) Thu, 11 Dec 2025 10:16:08 -0800


     [ 
https://issues.apache.org/jira/browse/CALCITE-7330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alessandro Solimando updated CALCITE-7330:
------------------------------------------
    Description: 
h2. Problem

_AggregateCaseToFilterRule_ incorrectly transforms queries for aggregate 
functions where _NULL_ values have semantic significance.

The rule transforms:
{noformat}
AGG(CASE WHEN condition THEN value ELSE NULL END)
{noformat}
to:
{noformat}
AGG(value) FILTER (WHERE condition)
{noformat}
This transformation is valid for standard SQL aggregates (SUM, AVG, MIN, MAX, 
COUNT) which ignore {_}NULL{_}s.

However, custom user-defined aggregate functions (UDAFs) may treat NULL values 
as semantically significant inputs, making this transformation incorrect (an 
example are Postgres array aggregate functions which are mostly not ignoring 
NULL values, see the 
[doc|https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-AGGREGATE]).
 The former passes NULL to the aggregate, the latter filters out those rows 
entirely.

AggregateCall.ignoreNulls() Is Not Enough

At first I thought that the existing _AggregateCall.ignoreNulls()_ could be 
used, but this field represents whether the user explicitly specified _IGNORE 
NULLS_ or _RESPECT NULLS_ syntax in their SQL query, which is only valid for 
window functions, while _AggregateCaseToFilterRule_ operates on _Aggregate_ 
nodes, so it's not modeling what we are interested into here.

My proposal is to add a _SqlAggFunction.skipsNullInputs()_ method to express 
the semantic null-handling behavior of aggregate function types:
 - Returns true by default (standard SQL behavior, all built-in aggregates 
ignore NULLs), so *+fully backward compatible+*
 - Custom UDAFs can override to return false to indicate NULLs are semantically 
significant
 - _AggregateCaseToFilterRule_ checks this method before applying the 
transformation

  was:
h2. Problem

_AggregateCaseToFilterRule_ incorrectly transforms queries for aggregate 
functions where _NULL_ values have semantic significance.

The rule transforms:
{noformat}
AGG(CASE WHEN condition THEN value ELSE NULL END)
{noformat}
to:
{noformat}
AGG(value) FILTER (WHERE condition)
{noformat}
This transformation is valid for standard SQL aggregates (SUM, AVG, MIN, MAX, 
COUNT) which ignore {_}NULL{_}s.

However, custom user-defined aggregate functions (UDAFs) may treat NULL values 
as semantically significant inputs, making this transformation incorrect (an 
example are Postgres array aggregate functions which are mostly not ignoring 
NULL values, see the 
[doc|https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-AGGREGATE]).
 The former passes NULL to the aggregate, the latter filters out those rows 
entirely.

AggregateCall.ignoreNulls() Is Not Enough

At first I thought that the existing _AggregateCall.ignoreNulls()_ could be 
used, but this field represents whether the user explicitly specified _IGNORE 
NULLS_ or _RESPECT NULLS_ syntax in their SQL query, which is only valid for 
window functions, while _AggregateCaseToFilterRule_ operates on _Aggregate_ 
nodes, so it's not modeling what we are interested into here.

My proposal is to add a _SqlAggFunction.ignoresNulls()_ method to express the 
semantic null-handling behavior of aggregate function types:
 - Returns true by default (standard SQL behavior, all built-in aggregates 
ignore NULLs), so *+fully backward compatible+*
 - Custom UDAFs can override to return false to indicate NULLs are semantically 
significant
 - _AggregateCaseToFilterRule_ checks this method before applying the 
transformation


> AggregateCaseToFilterRule should not be applied on aggregate functions that 
> don't ignore NULLs
> ----------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-7330
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7330
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.41.0
>            Reporter: Alessandro Solimando
>            Assignee: Alessandro Solimando
>            Priority: Major
>             Fix For: 1.42.0
>
>
> h2. Problem
> _AggregateCaseToFilterRule_ incorrectly transforms queries for aggregate 
> functions where _NULL_ values have semantic significance.
> The rule transforms:
> {noformat}
> AGG(CASE WHEN condition THEN value ELSE NULL END)
> {noformat}
> to:
> {noformat}
> AGG(value) FILTER (WHERE condition)
> {noformat}
> This transformation is valid for standard SQL aggregates (SUM, AVG, MIN, MAX, 
> COUNT) which ignore {_}NULL{_}s.
> However, custom user-defined aggregate functions (UDAFs) may treat NULL 
> values as semantically significant inputs, making this transformation 
> incorrect (an example are Postgres array aggregate functions which are mostly 
> not ignoring NULL values, see the 
> [doc|https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-AGGREGATE]).
>  The former passes NULL to the aggregate, the latter filters out those rows 
> entirely.
> AggregateCall.ignoreNulls() Is Not Enough
> At first I thought that the existing _AggregateCall.ignoreNulls()_ could be 
> used, but this field represents whether the user explicitly specified _IGNORE 
> NULLS_ or _RESPECT NULLS_ syntax in their SQL query, which is only valid for 
> window functions, while _AggregateCaseToFilterRule_ operates on _Aggregate_ 
> nodes, so it's not modeling what we are interested into here.
> My proposal is to add a _SqlAggFunction.skipsNullInputs()_ method to express 
> the semantic null-handling behavior of aggregate function types:
>  - Returns true by default (standard SQL behavior, all built-in aggregates 
> ignore NULLs), so *+fully backward compatible+*
>  - Custom UDAFs can override to return false to indicate NULLs are 
> semantically significant
>  - _AggregateCaseToFilterRule_ checks this method before applying the 
> transformation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-7330) AggregateCaseToFilterRule should not be applied on aggregate functions that don't ignore NULLs

Reply via email to