UBarney commented on PR #20180:
URL: https://github.com/apache/datafusion/pull/20180#issuecomment-3894045160

   > > I think we can implement `simplify` for `sum`
   > > 
https://github.com/apache/datafusion/blob/35ff4ab0a03fcc6615876eac76bac19887059ab3/datafusion/expr/src/udaf.rs#L682-L684
   > > 
   > > , just like in #18837; it should be simpler that way.
   > 
   > Hi, is this a blocking PR review comment? I am not able to understand how 
it would be simpler?
   > 
   > I am thinking that we still have the same logical flow
   > 
   > * Detect the pattern of SUM(col + literal)
   > * Modify the plan appropriately.
   
   The solution I proposed avoids the need to locate the `LogicalAgg` in the 
query plan and manually check if the `aggr_func` is `sum`. Even more 
importantly, it eliminates the need for the `datafusion-optimizer` crate to 
depend on `datafusion-functions-aggregate`.
   
   
   
   However, there is an issue I overlooked: the expression after simplification 
becomes `sum(col) + lit * count(col)`, which is no longer a single 
`AggregateFunction`. This causes an error in 
`create_aggregate_expr_with_name_and_maybe_filter` during the conversion to 
`PhysicalExpr`, as that function expects a pure aggregate expression
   
   I am still exploring the best way to handle this. One possible direction is 
to introduce a new rule or modify `map_logical_node_to_physical` to 
automatically inject a `Projection` when `Aggregate.aggr_expr` contains 
non-pure aggregate expression. If we can find a clean way to support this, it 
would enable us to handle many other aggregate rewrites via `simplify`, such as 
the one mentioned in https://github.com/apache/datafusion/issues/19637.
   
   Regarding the architecture, my current view is that the 
`datafusion-optimizer` crate should not depend on 
`datafusion-functions-aggregate`. It seems more appropriate for 
function-specific optimization rules to be implemented within 
`functions-aggregate` rather than in the `datafusion-optimizer`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to