Re: Approx N-tile and complex object return values

Gian Merlino Tue, 23 Apr 2019 15:19:58 -0700

> I'm inclined to say the expected data shape returned should be preserved.


This seems like a good general principle. Callers generally are happier
when the result shape is consistent. I have to admit, I don't understand
why an empty array would be better than null or vice versa. Both are a
different 'shape' than a normal result, so both would need special
checking. The array of NaNs seems nicest to me. Don't let me stand in the
way of painting the bike shed whatever color other folks want to paint it,
though.

On Tue, Apr 23, 2019 at 1:13 PM Jonathan Wei <[email protected]> wrote:

> > I'm inclined to say the expected data shape returned should be preserved.
>
> Agreed, I think the array of NaNs is the most correct return value in the
> case described there (and
>
> org.apache.druid.query.aggregation.histogram.FixedBucketsHistogram#percentilesFloat
> should be adjusted to that match convention).
>
> I don't see any difference between returning empty array vs null, to
> address the original concern in that thread.
>
> On Tue, Apr 23, 2019 at 12:42 PM Charles Allen
> <[email protected]> wrote:
> >
> > Hi all!
> >
> > If you do not use approximate quantiles (or histograms or quantiles
> > double sketch) then you can stop reading.
> >
> > https://github.com/apache/incubator-druid/issues/7486 brings up an
> > issue related to how objects are returned from Druid aggregations,
> > specifically when the input aggregation configuration has a complex
> > configuration (like an array of input values). I'm bringing the
> > discussion to the dev list so that any decisions are part of a more
> > official Apache review process and not accidentally tucked away in
> > github thread. Please be sure to check out the thread and
> > AlexanderSaydakov's insights.
> >
> > >  If a single quantile is requested, then the best answer must be NaN,
> not zero since zero is a perfectly good number and would be deeply
> misleading. What to do if an array of quantiles is requested?
> >
> > I'm inclined to say the expected data shape returned should be preserved.
> >
> > Let's say there's an alternate world where some other quantiles
> > estimation algorithm can either converge or not converge but it
> > depends on the % you requested. Like choosing the 50th percentile
> > might converge and give you a value but choosing the 99.99% might not.
> > In such a world it would be possible for SOME of the requested values
> > to resolve but not others. In this same world, if you were to do two
> > aggregators at `50%` and `99.99%`, vs one aggregator at `[50%,
> > 99.99%]`, I would hope the result would be directly relatable, and
> > that the array form would be one of optimization or convenience.
> >
> > As such, and since
> > `org.apache.druid.query.aggregation.histogram.ApproximateHistogram`
> > already sets a precedence for returning an array of `NaN`, I propose
> > the returned value for an array of quantiles be directly translatable
> > to the array-equivalent form of the result when requesting the
> > quantiles singularly in different aggregations. Which in this case I
> > believe would be an array of `NaN`.
> >
> > Thoughts?
> > Charles Allen
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
>

Re: Approx N-tile and complex object return values

Reply via email to