Hi Gleb, and Luke I was reading through the paper, blog and github you linked to. One thing I can't figure out is if it's possible to use the Moment Sketch to restore an original histogram. Given bucket boundaries: b0, b1, b2, b3, ... Can we obtain the counts for the number of values inserted each of the ranges: [-INF, B0), … [Bi, Bi+1), … (This is a requirement I need)
Not be confused with the percentile/threshold based queries discussed in the blog. Luke, were you suggesting collecting both and sending both over the FN API wire? I.e. collecting both - the variables to represent the Histogram as suggested in https://s.apache.org/beam-histogram-metrics: - In addition to the moment sketch variables <https://blog.acolyer.org/2018/10/31/moment-based-quantile-sketches-for-efficient-high-cardinality-aggregation-queries/> . I believe that would be feasible, as we would still retain the Histogram data. I don't think we can restore the Histograms with just the Sketch, if that was the suggestion. Please let me know if I misunderstood. If that's correct, I can write up the benefits and drawbacks I see for both approaches. On Mon, Aug 17, 2020 at 9:23 AM Luke Cwik <[email protected]> wrote: > That is an interesting suggestion to change to use a sketch. > > I believe having one metric URN that represents all this information > grouped together would make sense instead of attempting to aggregate > several metrics together. The underlying implementation of using > sum/count/max/min would stay the same but we would want a single object > that abstracts this complexity away for users as well. > > On Mon, Aug 17, 2020 at 3:42 AM Gleb Kanterov <[email protected]> wrote: > >> Didn't see proposal by Alex before today. I want to add a few more cents >> from my side. >> >> There is a paper Moment-based quantile sketches for efficient high >> cardinality aggregation queries [1], a TL;DR that for some N (around 10-20 >> depending on accuracy) we need to collect SUM(log^N(X)) ... log(X), >> COUNT(X), SUM(X), SUM(X^2)... SUM(X^N), MAX(X), MIN(X). Given aggregated >> numbers, it uses solver for Chebyshev polynomials to get quantile number, >> and there is already Java implementation for it on GitHub [2]. >> >> This way we can express quantiles using existing metric types in Beam, >> that can be already done without SDK or runner changes. It can fit nicely >> into existing runners and can be abstracted over if needed. I think this is >> also one of the best implementations, it has < 1% error rate for 200 bytes >> of storage, and quite efficient to compute. Did we consider using that? >> >> [1]: >> https://blog.acolyer.org/2018/10/31/moment-based-quantile-sketches-for-efficient-high-cardinality-aggregation-queries/ >> [2]: https://github.com/stanford-futuredata/msketch >> >> On Sat, Aug 15, 2020 at 6:15 AM Alex Amato <[email protected]> wrote: >> >>> The distinction here is that even though these metrics come from user >>> space, we still gave them specific URNs, which imply they have a specific >>> format, with specific labels, etc. >>> >>> That is, we won't be packaging them into a USER_HISTOGRAM urn. That URN >>> would have less expectation for its format. Today the USER_COUNTER just >>> expects like labels (TRANSFORM, NAME, NAMESPACE). >>> >>> We didn't decide on making a private API. But rather an API available to >>> user code for populating metrics with specific labels, and specific URNs. >>> The same API could pretty much be used for user USER_HISTOGRAM. with a >>> default URN chosen. >>> Thats how I see it in my head at the moment. >>> >>> >>> On Fri, Aug 14, 2020 at 8:52 PM Robert Bradshaw <[email protected]> >>> wrote: >>> >>>> On Fri, Aug 14, 2020 at 7:35 PM Alex Amato <[email protected]> wrote: >>>> > >>>> > I am only tackling the specific metrics covered in (for the python >>>> SDK first, then Java). To collect latency of IO API RPCS, and store it in a >>>> histogram. >>>> > https://s.apache.org/beam-gcp-debuggability >>>> > >>>> > User histogram metrics are unfunded, as far as I know. But you should >>>> be able to extend what I do for that project to the user metric use case. I >>>> agree, it won't be much more work to support that. I designed the histogram >>>> with the user histogram case in mind. >>>> >>>> From the portability point of view, all metrics generated in users >>>> code (and SDK-side IOs are "user code") are user metrics. But >>>> regardless of how things are named, once we have histogram metrics >>>> crossing the FnAPI boundary all the infrastructure will be in place. >>>> (At least the plan as I understand it shouldn't use private APIs >>>> accessible only by the various IOs but not other SDK-level code.) >>>> >>>> > On Fri, Aug 14, 2020 at 5:47 PM Robert Bradshaw <[email protected]> >>>> wrote: >>>> >> >>>> >> Once histograms are implemented in the SDK(s) (Alex, you're tackling >>>> >> this, right?) it shoudn't be much work to update the Samza worker >>>> code >>>> >> to publish these via the Samza runner APIs (in parallel with Alex's >>>> >> work to do the same on Dataflow). >>>> >> >>>> >> On Fri, Aug 14, 2020 at 5:35 PM Alex Amato <[email protected]> >>>> wrote: >>>> >> > >>>> >> > Noone has any plans currently to work on adding a generic >>>> histogram metric, at the moment. >>>> >> > >>>> >> > But I will be actively working on adding it for a specific set of >>>> metrics in the next quarter or so >>>> >> > https://s.apache.org/beam-gcp-debuggability >>>> >> > >>>> >> > After that work, one could take a look at my PRs for reference to >>>> create new metrics using the same histogram. One may wish to implement the >>>> UserHistogram use case and use that in the Samza Runner >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > On Fri, Aug 14, 2020 at 5:25 PM Ke Wu <[email protected]> wrote: >>>> >> >> >>>> >> >> Thank you Robert and Alex. I am not running a Beam job in Google >>>> Cloud but with Samza Runner, so I am wondering if there is any ETA to add >>>> the Histogram metrics in Metrics class so it can be mapped to the >>>> SamzaHistogram metric to the actual emitting. >>>> >> >> >>>> >> >> Best, >>>> >> >> Ke >>>> >> >> >>>> >> >> On Aug 14, 2020, at 4:44 PM, Alex Amato <[email protected]> >>>> wrote: >>>> >> >> >>>> >> >> One of the plans to use the histogram data is to send it to >>>> Google Monitoring to compute estimates of percentiles. This is done using >>>> the bucket counts and bucket boundaries. >>>> >> >> >>>> >> >> Here is a describing of roughly how its calculated. >>>> >> >> >>>> https://stackoverflow.com/questions/59635115/gcp-console-how-are-percentile-charts-calculated >>>> >> >> This is a non exact estimate. But plotting the estimated >>>> percentiles over time is often easier to understand and sufficient. >>>> >> >> (An alternative is a heatmap chart representing histograms over >>>> time. I.e. a histogram for each window of time). >>>> >> >> >>>> >> >> >>>> >> >> On Fri, Aug 14, 2020 at 4:16 PM Robert Bradshaw < >>>> [email protected]> wrote: >>>> >> >>> >>>> >> >>> You may be interested in the propose histogram metrics: >>>> >> >>> >>>> https://docs.google.com/document/d/1kiNG2BAR-51pRdBCK4-XFmc0WuIkSuBzeb__Zv8owbU/edit >>>> >> >>> >>>> >> >>> I think it'd be reasonable to add percentiles as its own metric >>>> type >>>> >> >>> as well. The tricky bit (though there are lots of resources on >>>> this) >>>> >> >>> is that one would have to publish more than just the percentiles >>>> from >>>> >> >>> each worker to be able to compute the final percentiles across >>>> all >>>> >> >>> workers. >>>> >> >>> >>>> >> >>> On Fri, Aug 14, 2020 at 4:05 PM Ke Wu <[email protected]> >>>> wrote: >>>> >> >>> > >>>> >> >>> > Hi everyone, >>>> >> >>> > >>>> >> >>> > I am looking to add percentile metrics (p50, p90 etc) to my >>>> beam job but I only find Counter, Gauge and Distribution metrics. I >>>> understand that I can calculate percentile metrics in my job itself and use >>>> Gauge to emit, however this is not an easy approach. On the other hand, >>>> Distribution metrics sounds like the one to go to according to its >>>> documentation: "A metric that reports information about the distribution of >>>> reported values.”, however it seems that it is intended for SUM, COUNT, >>>> MIN, MAX. >>>> >> >>> > >>>> >> >>> > The question(s) are: >>>> >> >>> > >>>> >> >>> > 1. is Distribution metric only intended for sum, count, min, >>>> max? >>>> >> >>> > 2. If Yes, can the documentation be updated to be more >>>> specific? >>>> >> >>> > 3. Can we add percentiles metric support, such as Histogram, >>>> with configurable list of percentiles to emit? >>>> >> >>> > >>>> >> >>> > Best, >>>> >> >>> > Ke >>>> >> >> >>>> >> >> >>>> >>>
