Re: Percentile metrics in Beam

Alex Amato Fri, 14 Aug 2020 19:35:47 -0700

I am only tackling the specific metrics covered in (for the python SDK
first, then Java). To collect latency of IO API RPCS, and store it in a
histogram.
https://s.apache.org/beam-gcp-debuggability


User histogram metrics are unfunded, as far as I know. But you should be
able to extend what I do for that project to the user metric use case. I
agree, it won't be much more work to support that. I designed the histogram
with the user histogram case in mind.

On Fri, Aug 14, 2020 at 5:47 PM Robert Bradshaw <[email protected]> wrote:

> Once histograms are implemented in the SDK(s) (Alex, you're tackling
> this, right?) it shoudn't be much work to update the Samza worker code
> to publish these via the Samza runner APIs (in parallel with Alex's
> work to do the same on Dataflow).
>
> On Fri, Aug 14, 2020 at 5:35 PM Alex Amato <[email protected]> wrote:
> >
> > Noone has any plans currently to work on adding a generic histogram
> metric, at the moment.
> >
> > But I will be actively working on adding it for a specific set of
> metrics in the next quarter or so
> > https://s.apache.org/beam-gcp-debuggability
> >
> > After that work, one could take a look at my PRs for reference to create
> new metrics using the same histogram. One may wish to implement the
> UserHistogram use case and use that in the Samza Runner
> >
> >
> >
> >
> > On Fri, Aug 14, 2020 at 5:25 PM Ke Wu <[email protected]> wrote:
> >>
> >> Thank you Robert and Alex. I am not running a Beam job in Google Cloud
> but with Samza Runner, so I am wondering if there is any ETA to add the
> Histogram metrics in Metrics class so it can be mapped to the
> SamzaHistogram metric to the actual emitting.
> >>
> >> Best,
> >> Ke
> >>
> >> On Aug 14, 2020, at 4:44 PM, Alex Amato <[email protected]> wrote:
> >>
> >> One of the plans to use the histogram data is to send it to Google
> Monitoring to compute estimates of percentiles. This is done using the
> bucket counts and bucket boundaries.
> >>
> >> Here is a describing of roughly how its calculated.
> >>
> https://stackoverflow.com/questions/59635115/gcp-console-how-are-percentile-charts-calculated
> >> This is a non exact estimate. But plotting the estimated percentiles
> over time is often easier to understand and sufficient.
> >> (An alternative is a heatmap chart representing histograms over time.
> I.e. a histogram for each window of time).
> >>
> >>
> >> On Fri, Aug 14, 2020 at 4:16 PM Robert Bradshaw <[email protected]>
> wrote:
> >>>
> >>> You may be interested in the propose histogram metrics:
> >>>
> https://docs.google.com/document/d/1kiNG2BAR-51pRdBCK4-XFmc0WuIkSuBzeb__Zv8owbU/edit
> >>>
> >>> I think it'd be reasonable to add percentiles as its own metric type
> >>> as well. The tricky bit (though there are lots of resources on this)
> >>> is that one would have to publish more than just the percentiles from
> >>> each worker to be able to compute the final percentiles across all
> >>> workers.
> >>>
> >>> On Fri, Aug 14, 2020 at 4:05 PM Ke Wu <[email protected]> wrote:
> >>> >
> >>> > Hi everyone,
> >>> >
> >>> > I am looking to add percentile metrics (p50, p90 etc) to my beam job
> but I only find Counter, Gauge and Distribution metrics. I understand that
> I can calculate percentile metrics in my job itself and use Gauge to emit,
> however this is not an easy approach. On the other hand, Distribution
> metrics sounds like the one to go to according to its documentation: "A
> metric that reports information about the distribution of reported
> values.”, however it seems that it is intended for SUM, COUNT, MIN, MAX.
> >>> >
> >>> > The question(s) are:
> >>> >
> >>> > 1. is Distribution metric only intended for sum, count, min, max?
> >>> > 2. If Yes, can the documentation be updated to be more specific?
> >>> > 3. Can we add percentiles metric support, such as Histogram, with
> configurable list of percentiles to emit?
> >>> >
> >>> > Best,
> >>> > Ke
> >>
> >>
>

Re: Percentile metrics in Beam

Reply via email to