Hi all! If you do not use approximate quantiles (or histograms or quantiles double sketch) then you can stop reading.
https://github.com/apache/incubator-druid/issues/7486 brings up an issue related to how objects are returned from Druid aggregations, specifically when the input aggregation configuration has a complex configuration (like an array of input values). I'm bringing the discussion to the dev list so that any decisions are part of a more official Apache review process and not accidentally tucked away in github thread. Please be sure to check out the thread and AlexanderSaydakov's insights. > If a single quantile is requested, then the best answer must be NaN, not > zero since zero is a perfectly good number and would be deeply misleading. > What to do if an array of quantiles is requested? I'm inclined to say the expected data shape returned should be preserved. Let's say there's an alternate world where some other quantiles estimation algorithm can either converge or not converge but it depends on the % you requested. Like choosing the 50th percentile might converge and give you a value but choosing the 99.99% might not. In such a world it would be possible for SOME of the requested values to resolve but not others. In this same world, if you were to do two aggregators at `50%` and `99.99%`, vs one aggregator at `[50%, 99.99%]`, I would hope the result would be directly relatable, and that the array form would be one of optimization or convenience. As such, and since `org.apache.druid.query.aggregation.histogram.ApproximateHistogram` already sets a precedence for returning an array of `NaN`, I propose the returned value for an array of quantiles be directly translatable to the array-equivalent form of the result when requesting the quantiles singularly in different aggregations. Which in this case I believe would be an array of `NaN`. Thoughts? Charles Allen --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
