[
https://issues.apache.org/jira/browse/CALCITE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
krooswu reassigned CALCITE-6166:
--------------------------------
Assignee: krooswu
> Optimization rule AggregateReduceFunctionsRule rewrites queries to
> semantically non-equivalent ones
> ---------------------------------------------------------------------------------------------------
>
> Key: CALCITE-6166
> URL: https://issues.apache.org/jira/browse/CALCITE-6166
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.36.0
> Reporter: Pinhan Zhao
> Assignee: krooswu
> Priority: Blocker
>
> The AggregateReduceFunctionsRule rewrites AVG to SUM divided by COUNT and
> applies a cast. However, the type being casted to is erroneously determined
> - it always converts the SUM / COUNT expression to INTEGER, which is
> inconsistent with the default return type of AVG originally. Therefore, if
> the average value originally evaluates to a non-integer number, the
> expression after rewriting will lost the floating point precision as being
> converted to an integer.
>
> This issue is manifested in various test cases
> ([https://github.com/apache/calcite/blob/08f685683e8e6e97b47189d95603ccccaf17051d/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java,]
> and expected outputs
> [https://github.com/apache/calcite/blob/08f685683e8e6e97b47189d95603ccccaf17051d/core/src/test/resources/org/apache/calcite/test/RelOptRulesTest.xml]).
>
>
> For example, the rule testReduceAverageAndSumWithNoReduceStddevAndVar
> rewrites the original query
> {code:java}
> select name, stddev_pop(deptno), avg(deptno), var_pop(deptno) from dept group
> by name{code}
> to
> {code:java}
> select name, stddev_pop(deptno), cast(coalesce(sum(deptno), 0) / count(*) as
> integer), var_pop(deptno) from dept group by name{code}
>
> However, with a counterexample
> {code:java}
> CREATE TABLE dept (
> deptno INTEGER PRIMARY KEY,
> name TEXT NOT NULL
> );
> INSERT INTO dept VALUES (1, 'Sales');
> INSERT INTO dept VALUES (2, 'Sales');
> {code}
>
> The original query and the optimized query produce different outputs:
> {code:java}
> name | stddev_pop | avg | var_pop
> -------+------------------------+--------------------+------------------------
> Sales | 0.50000000000000000000 | 1.5000000000000000 | 0.25000000000000000000
> (1 row)
> name | stddev_pop | int4 | var_pop
> -------+------------------------+------+------------------------
> Sales | 0.50000000000000000000 | 1 | 0.25000000000000000000
> (1 row) {code}
> Specifically, the values of avg (1.5 vs 1) are different.
>
> In addition to the above case, the following test cases are also affected:
> * testReduceAverageAndVarWithNoReduceStddev
> * testCastInAggregateReduceFunctions
> * testReduceAverage
> * testReduceWithNonTypePredicate
> * testReduceAverageWithNoReduceSum
> * testReduceAllAggregateFunctions
--
This message was sent by Atlassian Jira
(v8.20.10#820010)