alamb commented on PR #11261: URL: https://github.com/apache/datafusion/pull/11261#issuecomment-2211137678
Thank you @Rachelint ❤️ > It is mainly due to AggregateUDFImpl is defined in expr crate, and the needed PhysicalExpr is defined in datafusion-physical-expr-common crate which will refer exps a dependency. And if we import PhysicalExpr in expr, that will lead to cyclic dependecies... This makes sense -- I agree the ` AggregateUDFImpl` API should not be in terms of `PhysicalExpr` (for the dependency reasons you describe) One way I could think to get around this would be to figure out the column statistics in the physical planner and pass it directly (rather than making the UDF figure them out). Something like ```rust pub trait AggregateUDFImpl { ... /// If the output of this aggregate function can be determined /// only from input statistics (e.g. COUNT(..) with 0 rows, is always 0) /// return that value /// /// # Arguments /// stats: Overall statistics for the input (including row count) /// arg_stats: Statistics, if known, for each input argument fn output_from_stats( &self, stats: &Statistics, arg_stats: &[Option<&ColumnStatistics>] ) -> Option<ScalarValue> { None } ... } ``` ANother option might be to use the narrower API described on https://github.com/apache/datafusion/issues/11153 ```rust impl AggregateExpr { /// Return the value of the aggregate function, if known, given the number of input rows. /// /// Return None if the value can not be determined solely from the input. /// /// # Examples /// * The `COUNT` aggregate would return `Some(11)` given `num_rows = 11` /// * The `MIN` aggregate would return `Some(Null)` given `num_rows = 0 /// * The `MIN` aggregate would return `None` given num_rows = 11 fn output_from_rows(&self, num_rows: usize) -> Option<ScalarValue> { None } ... } ``` Though I think your idea is better (pass in statistics in generla) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org