andygrove commented on pull request #7971:
URL: https://github.com/apache/arrow/pull/7971#issuecomment-675245608


   When faced with choices like this, it is often helpful to look at how other
   projects implement this. Perhaps we could look at calcite or spark to see
   what choices they made? I am more familiar with spark at this point so
   could research the approach used there.
   
   On Mon, Aug 17, 2020, 9:59 PM Jorge Leitao <notificati...@github.com> wrote:
   
   > *@jorgecarleitao* commented on this pull request.
   > ------------------------------
   >
   > In rust/datafusion/src/execution/physical_plan/udf.rs
   > <https://github.com/apache/arrow/pull/7971#discussion_r471899766>:
   >
   > > +
   > +It is the developer of the function's responsibility to ensure that the 
aggregator correctly handles the different
   > +types that are presented to them, and that the return type correctly 
matches the type returned by the
   > +aggregator.
   > +
   > +It is the user of the function's responsibility to pass arguments to the 
function that have valid types.
   > +*/
   > +#[derive(Clone)]
   > +pub struct AggregateFunction {
   > +    /// Function name
   > +    pub name: String,
   > +    /// A list of arguments and their respective types. A function can 
accept more than one type as argument
   > +    /// (e.g. sum(i8), sum(u8)).
   > +    pub arg_types: Vec<Vec<DataType>>,
   > +    /// Return type. This function takes
   > +    pub return_type: ReturnType,
   >
   > This change and is under discussion in the mailing list.
   >
   > Essentially, the question is whether we should accept UDFs to have an
   > input-dependent type or not (should this be a function or a DataType).
   >
   > If we decide to not accept input-dependent types, then UDFs are simpler
   > (multiple input types, single output type), but we can't re-write our
   > aggregates as UDFs
   >
   > If we decide to accept input-dependent types, then UDFs are more complex
   > (multiple input types, multiple output type), and we can uniformize them
   > all in a single interface.
   >
   > We can also do something in the middle, on which we declare an interface
   > for functions in our end that support (multiple input types, multiple
   > output type), but only expose public interfaces to register (multiple input
   > types, single output type) UDFs.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/arrow/pull/7971#pullrequestreview-468974772>,
   > or unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AAHEBRBWO7BL54QSCQ7DPWDSBH4DZANCNFSM4QAJVXOA>
   > .
   >
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to