Re: [E] quantilesDoubleSketches min/max postaggregators

Alexander Saydakov Wed, 11 Nov 2020 10:22:09 -0800

Keep in mind that two arrays have different sizes. If you start from N
split points, the result is N+1 bins. If you ask for N bins, the internal
logic produces N-1 split points. And you want these split points to be a
part of the returned result.


On Tue, Nov 10, 2020 at 10:23 AM Alexander Saydakov <
[email protected]> wrote:

> I am not sure how important the compatibility with the current version is.
> I am afraid I don't have time to work on this at the moment.
> I would like to see some discussion about the best way forward.
> Would you be willing to contribute this change once the community agrees
> on the output format?
>
> On Mon, Nov 9, 2020 at 3:36 AM Jérémie Girault <[email protected]>
> wrote:
>
>> Hello,
>>
>> This info about q0 and q1 is good to know, I will use it, thank you !
>>
>> As a user in order to plot that I would be glad to get the split points
>> alongside the histogram values.
>> It would be as useful to retrieve them from the `numBins` or
>> `splitPoints` for consistency indeed: when I need to display histogram I
>> don’t want to use two different code path to handle the request result.
>>
>> I can imagine different formats I could use with each pro and cons :
>> - list of tuple: `[ [ <bin value>, <bin count> ], ... ]`
>>         pro: simple
>>         con: the format may be confusing without docs, breaks the current
>> output format (can be solved by adding a flag controlling output)
>> - list of objects: `[ { "value": <value>, "count": <count> }, ...]`
>>         pro: simple, timeseries-like, probably the most easy to display
>>         con: breaks the current output format (can be solved by adding a
>> flag controlling output)
>> - bins postAggregator + histogram values postAggregator : `{ bins: [ ...
>> ], values: [ ... ] }`
>>         pro: compatible with current format, feature is available
>> on-demand
>>         con: must zip arrays on client side
>>
>> What do you think ?
>>
>> --
>>
>> Jérémie Girault
>> Le 6 nov. 2020 à 19:19 +0100, Alexander Saydakov <
>> [email protected]>, a écrit :
>> > quantile(0) = min value
>> > quantile(1) = max value
>> > you can use sketch-to-quantiles post agg to get min, max or any number
>> of
>> > other quantiles
>> >
>> > Regarding your observation that sketch-to-histogram(num bins) does not
>> give
>> > information about the computed split points. That is valuable feedback.
>> > Perhaps, we could consider returning the split points somehow, but I am
>> not
>> > quite sure what the return type should be. We need to return two arrays:
>> > probability mass in each bin as we do currently - that is one array of
>> > doubles, and split points computed from min, max, and given number of
>> bins.
>> > And this post agg can accept split points - should we return them in
>> that
>> > case as well for consistency?
>> >
>> >
>> > On Fri, Nov 6, 2020 at 3:30 AM Jérémie Girault <[email protected]>
>> wrote:
>> >
>> > > Hello everyone,
>> > >
>> > > I previously asked a question on the ASF slack and someone replied to
>> me
>> > > by asking me to send the question on the dev list. I just subscribed
>> to the
>> > > list to forward the message I sent :
>> > >
>> > > I was playing with the DataSketches Quantiles Sketch module in druid
>> > > trying to retrieve some histograms using
>> quantilesDoublesSketchToHistogram.
>> > > However I couldn't label the values I retrieved for each bin when
>> using
>> > > numBins when trying to plot them.
>> > > I can’t seem to find any postAggregator that allows me to get min/max
>> > > values in order to recompute bins on the client side.
>> > > Should I use min/max aggregators when ingesting, and query them
>> alongside
>> > > my histogram as a workaround ? It seem a lot of space/time that would
>> seem
>> > > to be « free » to retrieve using Quantile Sketches.
>> > > Wouldn’t it be useful to have min/max postAggregators for
>> > > quantilesDoubleSketches aggregator and/or histogram bins labels ?
>> > > I located this chunk of code:
>> > >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_druid_blob_master_extensions-2Dcore_datasketches_src_main_java_org_apache_druid_query_aggregation_datasketches_quantiles_DoublesSketchToHistogramPostAggregator.java&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=PUk1rdn3YFgKzf5pRy7hKdCZt_J-_DZgbh_wjexBneI&s=fb3Uh150BuY9jtM8DqGofrqtwQrM9jDfupPq6MwF5hk&e=
>> > > That does not seem overly complicated in a way I could not
>> contribute, but
>> > > I’m not used to java dev these days and it would take me a while to
>> get it
>> > > right.
>> > > Would such features be considered if requested/submitted ?
>> > >
>> > > Thank you,
>> > >
>> > > --
>> > >
>> > > Jérémie Girault
>> > >
>>
>

Re: [E] quantilesDoubleSketches min/max postaggregators

Reply via email to