Charles,
That might work. The t-digest will give us a median estimate.
On Mon, Aug 12, 2019 at 4:33 PM Charles Givre wrote:
> HI Ted,
> You might want to take a look at this repo:
>
HI Ted,
You might want to take a look at this repo:
https://github.com/cgivre/drill-stats-function/blob/master/src/main/java/org/apache/drill/contrib/function/DrillStatsFunctions.java
Hi Ted,
You are now at the point that you'll have to experiment. Drill provides an
annotation for aggregate state: @Workspace. The value must be declared as a
"holder". You'll have to check if VarBinaryHolder is allowed, and, if so, how
you allocate memory and remember the offset into the
I am trying to figure out how to build an approximate percentile estimator.
I have a fancy data structure that will do this. It can live in bounded
memory with no allocation. I can add numbers to the digest easily enough.
And the required results can be extracted from the structure.
What I would
Can UDFs accumulate a fixed length binary value?
On Mon, Aug 12, 2019 at 11:23 AM Paul Rogers
wrote:
> Hi Ted,
>
> Thanks for the link; I suspected there was some trick for stddev. The
> point still stands that, if the algorithm requires multiple passes over the
> data (ML, say), can't be
Ted,
Can we ask what it is you are trying to build a UDF for?
--C
> On Aug 12, 2019, at 2:23 PM, Paul Rogers wrote:
>
> Hi Ted,
>
> Thanks for the link; I suspected there was some trick for stddev. The point
> still stands that, if the algorithm requires multiple passes over the data
> (ML,
Hi Ted,
Thanks for the link; I suspected there was some trick for stddev. The point
still stands that, if the algorithm requires multiple passes over the data (ML,
say), can't be done in Drill.
Each UDF must return exactly one value. It can return a map if you want
multiple values (though
Is it possible for a UDF to produce multiple scalar results? Can it produce
a binary result?
Also, as a nit, standard deviation doesn't require buffering all the data.
It just requires that you have three accumulators, one for count, one for
mean and one for mean squared deviation. There is a
Hi Ted,
Last I checked (when we wrote the book chapter on the subject), aggregate state
are limited to scalars and Drill-defined types. There is no support to spill
aggregate state, so that state will be lost if spilling is required to handle
large aggregate batches. The current solution works
What is the current state of building aggregators that have complex state
via UDFs?
Is it possible to define multi-level aggregators in a UDF?
Can the output of a UDF be a byte array?
(these are three different questions)
10 matches
Mail list logo