alamb commented on PR #11261:
URL: https://github.com/apache/datafusion/pull/11261#issuecomment-2211137678

   Thank you @Rachelint ❤️ 
   
    > It is mainly due to AggregateUDFImpl is defined in expr crate, and the 
needed PhysicalExpr is defined in datafusion-physical-expr-common crate which 
will refer exps a dependency. And if we import PhysicalExpr in expr, that will 
lead to cyclic dependecies...
   
   This makes sense -- I agree the ` AggregateUDFImpl` API should not be in 
terms of `PhysicalExpr` (for the dependency reasons you describe)
   
   One way I could think to get around this would be to figure out the column 
statistics in the physical planner and pass it directly (rather than making the 
UDF figure them out). Something like
   
   ```rust
   pub trait AggregateUDFImpl {
   ...
      /// If the output of this aggregate function can be determined 
      /// only from input statistics (e.g. COUNT(..) with 0 rows, is always 0)
      /// return that value
      ///
      /// # Arguments
      /// stats: Overall statistics for the input (including row count)
      /// arg_stats: Statistics, if known, for each input argument
       fn output_from_stats(
           &self,
           stats: &Statistics,
           arg_stats: &[Option<&ColumnStatistics>]
       ) -> Option<ScalarValue> {
           None
       }
   ...
   }
   ```
   
   ANother option might be to use the narrower API described on 
https://github.com/apache/datafusion/issues/11153
   
   ```rust
   impl AggregateExpr {
   
    /// Return the value of the aggregate function, if known, given the number 
of input rows.
    /// 
    /// Return None if the value can not be determined solely from the input.
    /// 
    /// # Examples
    /// * The `COUNT` aggregate would return `Some(11)` given `num_rows = 11`
    /// * The `MIN` aggregate would return `Some(Null)` given `num_rows = 0
    /// * The `MIN` aggregate would return `None` given num_rows = 11
    fn output_from_rows(&self, num_rows: usize) -> Option<ScalarValue> { None }
   ...
   }
   ```
   
   Though I think your idea is better (pass in statistics in generla)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to