Is it possible for a UDF to produce multiple scalar results? Can it produce
a binary result?

Also, as a nit, standard deviation doesn't require buffering all the data.
It just requires that you have three accumulators, one for count, one for
mean and one for mean squared deviation.  There is a slightly tricky
algorithm called Welford's algorithm
<https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm>
which
allows good numerical stability while computing this on-line.

On Mon, Aug 12, 2019 at 9:01 AM Paul Rogers <par0...@yahoo.com.invalid>
wrote:

> Hi Ted,
>
> Last I checked (when we wrote the book chapter on the subject), aggregate
> state are limited to scalars and Drill-defined types. There is no support
> to spill aggregate state, so that state will be lost if spilling is
> required to handle large aggregate batches. The current solution works for
> simple cases such as totals and averages.
>
> Aggregate UDFs share no state, so it is not possible for one function to
> use state accumulated by another. If, for example, you want sum, average
> and standard deviation, you'll have to accumulate the total three times,
> average twice, and so on. Note that the std dev function will require
> buffering all data in one's own array (without any spilling or other
> support), to allow computing the (X-bar - X)^2 part of the calculation.
>
> A UDF can emit a byte array (have to check it this is true of aggregate
> UDFs). A VarChar is simply a special kind of array, and UDFs can emit a
> VarChar.
>
> All this is from memory and so is only approximately accurate. YMMV.
>
> Thanks,
> - Paul
>
>
>
>     On Monday, August 12, 2019, 07:35:47 AM PDT, Ted Dunning <
> ted.dunn...@gmail.com> wrote:
>
>  What is the current state of building aggregators that have complex state
> via UDFs?
>
> Is it possible to define multi-level aggregators in a UDF?
>
> Can the output of a UDF be a byte array?
>
>
> (these are three different questions)
>

Reply via email to