Approx N-tile and complex object return values

Charles Allen Tue, 23 Apr 2019 12:42:48 -0700

Hi all!

If you do not use approximate quantiles (or histograms or quantiles
double sketch) then you can stop reading.


https://github.com/apache/incubator-druid/issues/7486 brings up an
issue related to how objects are returned from Druid aggregations,
specifically when the input aggregation configuration has a complex
configuration (like an array of input values). I'm bringing the
discussion to the dev list so that any decisions are part of a more
official Apache review process and not accidentally tucked away in
github thread. Please be sure to check out the thread and
AlexanderSaydakov's insights.

>  If a single quantile is requested, then the best answer must be NaN, not 
> zero since zero is a perfectly good number and would be deeply misleading. 
> What to do if an array of quantiles is requested?

I'm inclined to say the expected data shape returned should be preserved.

Let's say there's an alternate world where some other quantiles
estimation algorithm can either converge or not converge but it
depends on the % you requested. Like choosing the 50th percentile
might converge and give you a value but choosing the 99.99% might not.
In such a world it would be possible for SOME of the requested values
to resolve but not others. In this same world, if you were to do two
aggregators at `50%` and `99.99%`, vs one aggregator at `[50%,
99.99%]`, I would hope the result would be directly relatable, and
that the array form would be one of optimization or convenience.

As such, and since
`org.apache.druid.query.aggregation.histogram.ApproximateHistogram`
already sets a precedence for returning an array of `NaN`, I propose
the returned value for an array of quantiles be directly translatable
to the array-equivalent form of the result when requesting the
quantiles singularly in different aggregations. Which in this case I
believe would be an array of `NaN`.

Thoughts?
Charles Allen

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Approx N-tile and complex object return values

Reply via email to