mertak-synnada opened a new issue, #15633:
URL: https://github.com/apache/datafusion/issues/15633
### Current Behavior
When a query such as the following is executed:
`EXPLAIN SELECT SUM(id), SUM(id) + 1, SUM(id) + 2 FROM employees;`
The resulting execution plans are:
**Logical Plan:**
```
Projection: sum(employees.id), sum(employees.id) + Int64(1), sum(employee
s.id) + Int64(2)
Aggregate: groupBy=[[]], aggr=[[sum(CAST(employees.id AS Int64))]]
TableScan: employees projection=[id]
```
**Physical Plan:**
```
ProjectionExec: expr=[sum(employees.id)@0 as sum(employees.id), sum(emp
loyees.id)@0 + 1 as sum(employees.id) + Int64(1), sum(employees.id)@0 + 2
as sum(employees.id) + Int64(2)]
AggregateExec: mode=Single, gby=[], aggr=[sum(employees.id)]
MemoryExec: partitions=1, partition_sizes=[1]
```
As demonstrated, the repeated calculation of SUM(employees.id) is performed
only
once. The subsequent additions ( +1 and +2 ) are handled by the
`ProjectionExec`
through basic arithmetic operations.
### Problematic Case
Consider a more complex query inspired by Clickbench Query 29:
SELECT SUM(id), SUM(id+1), SUM(id+2), ..., SUM(id+89) FROM employees;
The execution plans are:
**Logical Plan:**
```
Aggregate: groupBy=[[]], aggr=[[sum(__common_expr_1 AS employees.id), su
m(__common_expr_1 AS employees.id + Int64(1)), ... , sum(__common_expr_1 A
S employees.id + Int64(89))]]
Projection: CAST(employees.id AS Int64) AS __common_expr_1
TableScan: employees projection=[id]
```
**Physical Plan:**
```
AggregateExec: mode=Single, gby=[], aggr=[sum(employees.id), sum(employ
ees.id + Int64(1)), ... , sum(employees.id + Int64(89))]
ProjectionExec: expr=[CAST(id@0 AS Int64) as __common_expr_1]
MemoryExec: partitions=1, partition_sizes=[1]
```
In this case, each `SUM` operation is calculated independently, which is
inefficient.
The desired approach is to compute `SUM(employees.id)` once and then apply
the
linearity property to compute` SUM(id + n)` as `SUM(id) + SUM(n)` for each
additional
term ( 1...89 ). If the `N` is a constant (here it is) the operation should
be converted as
`N*COUNT(1)` .
Other references: https://github.com/apache/datafusion/pull/15532
https://github.com/apache/datafusion/issues/15524
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]