Keep in mind that two arrays have different sizes. If you start from N split points, the result is N+1 bins. If you ask for N bins, the internal logic produces N-1 split points. And you want these split points to be a part of the returned result.
On Tue, Nov 10, 2020 at 10:23 AM Alexander Saydakov < sayda...@verizonmedia.com> wrote: > I am not sure how important the compatibility with the current version is. > I am afraid I don't have time to work on this at the moment. > I would like to see some discussion about the best way forward. > Would you be willing to contribute this change once the community agrees > on the output format? > > On Mon, Nov 9, 2020 at 3:36 AM Jérémie Girault <jere...@hubvisor.io> > wrote: > >> Hello, >> >> This info about q0 and q1 is good to know, I will use it, thank you ! >> >> As a user in order to plot that I would be glad to get the split points >> alongside the histogram values. >> It would be as useful to retrieve them from the `numBins` or >> `splitPoints` for consistency indeed: when I need to display histogram I >> don’t want to use two different code path to handle the request result. >> >> I can imagine different formats I could use with each pro and cons : >> - list of tuple: `[ [ <bin value>, <bin count> ], ... ]` >> pro: simple >> con: the format may be confusing without docs, breaks the current >> output format (can be solved by adding a flag controlling output) >> - list of objects: `[ { "value": <value>, "count": <count> }, ...]` >> pro: simple, timeseries-like, probably the most easy to display >> con: breaks the current output format (can be solved by adding a >> flag controlling output) >> - bins postAggregator + histogram values postAggregator : `{ bins: [ ... >> ], values: [ ... ] }` >> pro: compatible with current format, feature is available >> on-demand >> con: must zip arrays on client side >> >> What do you think ? >> >> -- >> >> Jérémie Girault >> Le 6 nov. 2020 à 19:19 +0100, Alexander Saydakov < >> sayda...@verizonmedia.com.invalid>, a écrit : >> > quantile(0) = min value >> > quantile(1) = max value >> > you can use sketch-to-quantiles post agg to get min, max or any number >> of >> > other quantiles >> > >> > Regarding your observation that sketch-to-histogram(num bins) does not >> give >> > information about the computed split points. That is valuable feedback. >> > Perhaps, we could consider returning the split points somehow, but I am >> not >> > quite sure what the return type should be. We need to return two arrays: >> > probability mass in each bin as we do currently - that is one array of >> > doubles, and split points computed from min, max, and given number of >> bins. >> > And this post agg can accept split points - should we return them in >> that >> > case as well for consistency? >> > >> > >> > On Fri, Nov 6, 2020 at 3:30 AM Jérémie Girault <jere...@hubvisor.io> >> wrote: >> > >> > > Hello everyone, >> > > >> > > I previously asked a question on the ASF slack and someone replied to >> me >> > > by asking me to send the question on the dev list. I just subscribed >> to the >> > > list to forward the message I sent : >> > > >> > > I was playing with the DataSketches Quantiles Sketch module in druid >> > > trying to retrieve some histograms using >> quantilesDoublesSketchToHistogram. >> > > However I couldn't label the values I retrieved for each bin when >> using >> > > numBins when trying to plot them. >> > > I can’t seem to find any postAggregator that allows me to get min/max >> > > values in order to recompute bins on the client side. >> > > Should I use min/max aggregators when ingesting, and query them >> alongside >> > > my histogram as a workaround ? It seem a lot of space/time that would >> seem >> > > to be « free » to retrieve using Quantile Sketches. >> > > Wouldn’t it be useful to have min/max postAggregators for >> > > quantilesDoubleSketches aggregator and/or histogram bins labels ? >> > > I located this chunk of code: >> > > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_druid_blob_master_extensions-2Dcore_datasketches_src_main_java_org_apache_druid_query_aggregation_datasketches_quantiles_DoublesSketchToHistogramPostAggregator.java&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=PUk1rdn3YFgKzf5pRy7hKdCZt_J-_DZgbh_wjexBneI&s=fb3Uh150BuY9jtM8DqGofrqtwQrM9jDfupPq6MwF5hk&e= >> > > That does not seem overly complicated in a way I could not >> contribute, but >> > > I’m not used to java dev these days and it would take me a while to >> get it >> > > right. >> > > Would such features be considered if requested/submitted ? >> > > >> > > Thank you, >> > > >> > > -- >> > > >> > > Jérémie Girault >> > > >> >