Hi Alex, It is great to know you are working on the metrics. Do you have any concern if we add a Histogram type metrics in Samza Runner itself for now so we can start using it before a generic histogram metrics can be introduced in the Metrics class?
Best, Ke > On Aug 18, 2020, at 12:57 AM, Gleb Kanterov <[email protected]> wrote: > > Hi Alex, > > I'm not sure about restoring histogram, because the use-case I had in the > past used percentiles. As I understand it, you can approximate histogram if > you know percentiles and total count. E.g. 5% of values fall into [P95, +INF) > bucket, other 5% [P90, P95), etc. I don't understand the paper well enough to > say how it's going to work if given bucket boundaries happen to include a > small number of values. I guess it's a similar kind of trade-off when we need > to choose boundaries if we want to get percentiles from histogram buckets. I > see primarily moment sketch as a method intended to approximate percentiles, > not histogram buckets. > > /Gleb > > On Tue, Aug 18, 2020 at 2:13 AM Alex Amato <[email protected] > <mailto:[email protected]>> wrote: > Hi Gleb, and Luke > > I was reading through the paper, blog and github you linked to. One thing I > can't figure out is if it's possible to use the Moment Sketch to restore an > original histogram. > Given bucket boundaries: b0, b1, b2, b3, ... > Can we obtain the counts for the number of values inserted each of the > ranges: [-INF, B0), … [Bi, Bi+1), … > (This is a requirement I need) > > Not be confused with the percentile/threshold based queries discussed in the > blog. > > Luke, were you suggesting collecting both and sending both over the FN API > wire? I.e. collecting both > the variables to represent the Histogram as suggested in > https://s.apache.org/beam-histogram-metrics > <https://s.apache.org/beam-histogram-metrics>: > In addition to the moment sketch variables > <https://blog.acolyer.org/2018/10/31/moment-based-quantile-sketches-for-efficient-high-cardinality-aggregation-queries/>. > I believe that would be feasible, as we would still retain the Histogram > data. I don't think we can restore the Histograms with just the Sketch, if > that was the suggestion. Please let me know if I misunderstood. > > If that's correct, I can write up the benefits and drawbacks I see for both > approaches. > > > On Mon, Aug 17, 2020 at 9:23 AM Luke Cwik <[email protected] > <mailto:[email protected]>> wrote: > That is an interesting suggestion to change to use a sketch. > > I believe having one metric URN that represents all this information grouped > together would make sense instead of attempting to aggregate several metrics > together. The underlying implementation of using sum/count/max/min would stay > the same but we would want a single object that abstracts this complexity > away for users as well. > > On Mon, Aug 17, 2020 at 3:42 AM Gleb Kanterov <[email protected] > <mailto:[email protected]>> wrote: > Didn't see proposal by Alex before today. I want to add a few more cents from > my side. > > There is a paper Moment-based quantile sketches for efficient high > cardinality aggregation queries [1], a TL;DR that for some N (around 10-20 > depending on accuracy) we need to collect SUM(log^N(X)) ... log(X), COUNT(X), > SUM(X), SUM(X^2)... SUM(X^N), MAX(X), MIN(X). Given aggregated numbers, it > uses solver for Chebyshev polynomials to get quantile number, and there is > already Java implementation for it on GitHub [2]. > > This way we can express quantiles using existing metric types in Beam, that > can be already done without SDK or runner changes. It can fit nicely into > existing runners and can be abstracted over if needed. I think this is also > one of the best implementations, it has < 1% error rate for 200 bytes of > storage, and quite efficient to compute. Did we consider using that? > > [1]: > https://blog.acolyer.org/2018/10/31/moment-based-quantile-sketches-for-efficient-high-cardinality-aggregation-queries/ > > <https://blog.acolyer.org/2018/10/31/moment-based-quantile-sketches-for-efficient-high-cardinality-aggregation-queries/> > [2]: https://github.com/stanford-futuredata/msketch > <https://github.com/stanford-futuredata/msketch> > On Sat, Aug 15, 2020 at 6:15 AM Alex Amato <[email protected] > <mailto:[email protected]>> wrote: > The distinction here is that even though these metrics come from user space, > we still gave them specific URNs, which imply they have a specific format, > with specific labels, etc. > > That is, we won't be packaging them into a USER_HISTOGRAM urn. That URN would > have less expectation for its format. Today the USER_COUNTER just expects > like labels (TRANSFORM, NAME, NAMESPACE). > > We didn't decide on making a private API. But rather an API available to user > code for populating metrics with specific labels, and specific URNs. The same > API could pretty much be used for user USER_HISTOGRAM. with a default URN > chosen. > Thats how I see it in my head at the moment. > > > On Fri, Aug 14, 2020 at 8:52 PM Robert Bradshaw <[email protected] > <mailto:[email protected]>> wrote: > On Fri, Aug 14, 2020 at 7:35 PM Alex Amato <[email protected] > <mailto:[email protected]>> wrote: > > > > I am only tackling the specific metrics covered in (for the python SDK > > first, then Java). To collect latency of IO API RPCS, and store it in a > > histogram. > > https://s.apache.org/beam-gcp-debuggability > > <https://s.apache.org/beam-gcp-debuggability> > > > > User histogram metrics are unfunded, as far as I know. But you should be > > able to extend what I do for that project to the user metric use case. I > > agree, it won't be much more work to support that. I designed the histogram > > with the user histogram case in mind. > > From the portability point of view, all metrics generated in users > code (and SDK-side IOs are "user code") are user metrics. But > regardless of how things are named, once we have histogram metrics > crossing the FnAPI boundary all the infrastructure will be in place. > (At least the plan as I understand it shouldn't use private APIs > accessible only by the various IOs but not other SDK-level code.) > > > On Fri, Aug 14, 2020 at 5:47 PM Robert Bradshaw <[email protected] > > <mailto:[email protected]>> wrote: > >> > >> Once histograms are implemented in the SDK(s) (Alex, you're tackling > >> this, right?) it shoudn't be much work to update the Samza worker code > >> to publish these via the Samza runner APIs (in parallel with Alex's > >> work to do the same on Dataflow). > >> > >> On Fri, Aug 14, 2020 at 5:35 PM Alex Amato <[email protected] > >> <mailto:[email protected]>> wrote: > >> > > >> > Noone has any plans currently to work on adding a generic histogram > >> > metric, at the moment. > >> > > >> > But I will be actively working on adding it for a specific set of > >> > metrics in the next quarter or so > >> > https://s.apache.org/beam-gcp-debuggability > >> > <https://s.apache.org/beam-gcp-debuggability> > >> > > >> > After that work, one could take a look at my PRs for reference to create > >> > new metrics using the same histogram. One may wish to implement the > >> > UserHistogram use case and use that in the Samza Runner > >> > > >> > > >> > > >> > > >> > On Fri, Aug 14, 2020 at 5:25 PM Ke Wu <[email protected] > >> > <mailto:[email protected]>> wrote: > >> >> > >> >> Thank you Robert and Alex. I am not running a Beam job in Google Cloud > >> >> but with Samza Runner, so I am wondering if there is any ETA to add the > >> >> Histogram metrics in Metrics class so it can be mapped to the > >> >> SamzaHistogram metric to the actual emitting. > >> >> > >> >> Best, > >> >> Ke > >> >> > >> >> On Aug 14, 2020, at 4:44 PM, Alex Amato <[email protected] > >> >> <mailto:[email protected]>> wrote: > >> >> > >> >> One of the plans to use the histogram data is to send it to Google > >> >> Monitoring to compute estimates of percentiles. This is done using the > >> >> bucket counts and bucket boundaries. > >> >> > >> >> Here is a describing of roughly how its calculated. > >> >> https://stackoverflow.com/questions/59635115/gcp-console-how-are-percentile-charts-calculated > >> >> > >> >> <https://stackoverflow.com/questions/59635115/gcp-console-how-are-percentile-charts-calculated> > >> >> This is a non exact estimate. But plotting the estimated percentiles > >> >> over time is often easier to understand and sufficient. > >> >> (An alternative is a heatmap chart representing histograms over time. > >> >> I.e. a histogram for each window of time). > >> >> > >> >> > >> >> On Fri, Aug 14, 2020 at 4:16 PM Robert Bradshaw <[email protected] > >> >> <mailto:[email protected]>> wrote: > >> >>> > >> >>> You may be interested in the propose histogram metrics: > >> >>> https://docs.google.com/document/d/1kiNG2BAR-51pRdBCK4-XFmc0WuIkSuBzeb__Zv8owbU/edit > >> >>> > >> >>> <https://docs.google.com/document/d/1kiNG2BAR-51pRdBCK4-XFmc0WuIkSuBzeb__Zv8owbU/edit> > >> >>> > >> >>> I think it'd be reasonable to add percentiles as its own metric type > >> >>> as well. The tricky bit (though there are lots of resources on this) > >> >>> is that one would have to publish more than just the percentiles from > >> >>> each worker to be able to compute the final percentiles across all > >> >>> workers. > >> >>> > >> >>> On Fri, Aug 14, 2020 at 4:05 PM Ke Wu <[email protected] > >> >>> <mailto:[email protected]>> wrote: > >> >>> > > >> >>> > Hi everyone, > >> >>> > > >> >>> > I am looking to add percentile metrics (p50, p90 etc) to my beam job > >> >>> > but I only find Counter, Gauge and Distribution metrics. I > >> >>> > understand that I can calculate percentile metrics in my job itself > >> >>> > and use Gauge to emit, however this is not an easy approach. On the > >> >>> > other hand, Distribution metrics sounds like the one to go to > >> >>> > according to its documentation: "A metric that reports information > >> >>> > about the distribution of reported values.”, however it seems that > >> >>> > it is intended for SUM, COUNT, MIN, MAX. > >> >>> > > >> >>> > The question(s) are: > >> >>> > > >> >>> > 1. is Distribution metric only intended for sum, count, min, max? > >> >>> > 2. If Yes, can the documentation be updated to be more specific? > >> >>> > 3. Can we add percentiles metric support, such as Histogram, with > >> >>> > configurable list of percentiles to emit? > >> >>> > > >> >>> > Best, > >> >>> > Ke > >> >> > >> >>
