Hi Ted,

Last I checked (when we wrote the book chapter on the subject), aggregate state 
are limited to scalars and Drill-defined types. There is no support to spill 
aggregate state, so that state will be lost if spilling is required to handle 
large aggregate batches. The current solution works for simple cases such as 
totals and averages.

Aggregate UDFs share no state, so it is not possible for one function to use 
state accumulated by another. If, for example, you want sum, average and 
standard deviation, you'll have to accumulate the total three times, average 
twice, and so on. Note that the std dev function will require buffering all 
data in one's own array (without any spilling or other support), to allow 
computing the (X-bar - X)^2 part of the calculation.

A UDF can emit a byte array (have to check it this is true of aggregate UDFs). 
A VarChar is simply a special kind of array, and UDFs can emit a VarChar.

All this is from memory and so is only approximately accurate. YMMV.

Thanks,
- Paul

 

    On Monday, August 12, 2019, 07:35:47 AM PDT, Ted Dunning 
<ted.dunn...@gmail.com> wrote:  
 
 What is the current state of building aggregators that have complex state
via UDFs?

Is it possible to define multi-level aggregators in a UDF?

Can the output of a UDF be a byte array?


(these are three different questions)
  

Reply via email to