UBarney commented on PR #20180: URL: https://github.com/apache/datafusion/pull/20180#issuecomment-3894045160
> > I think we can implement `simplify` for `sum` > > https://github.com/apache/datafusion/blob/35ff4ab0a03fcc6615876eac76bac19887059ab3/datafusion/expr/src/udaf.rs#L682-L684 > > > > , just like in #18837; it should be simpler that way. > > Hi, is this a blocking PR review comment? I am not able to understand how it would be simpler? > > I am thinking that we still have the same logical flow > > * Detect the pattern of SUM(col + literal) > * Modify the plan appropriately. The solution I proposed avoids the need to locate the `LogicalAgg` in the query plan and manually check if the `aggr_func` is `sum`. Even more importantly, it eliminates the need for the `datafusion-optimizer` crate to depend on `datafusion-functions-aggregate`. However, there is an issue I overlooked: the expression after simplification becomes `sum(col) + lit * count(col)`, which is no longer a single `AggregateFunction`. This causes an error in `create_aggregate_expr_with_name_and_maybe_filter` during the conversion to `PhysicalExpr`, as that function expects a pure aggregate expression I am still exploring the best way to handle this. One possible direction is to introduce a new rule or modify `map_logical_node_to_physical` to automatically inject a `Projection` when `Aggregate.aggr_expr` contains non-pure aggregate expression. If we can find a clean way to support this, it would enable us to handle many other aggregate rewrites via `simplify`, such as the one mentioned in https://github.com/apache/datafusion/issues/19637. Regarding the architecture, my current view is that the `datafusion-optimizer` crate should not depend on `datafusion-functions-aggregate`. It seems more appropriate for function-specific optimization rules to be implemented within `functions-aggregate` rather than in the `datafusion-optimizer`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
