Re: Percentile metrics in Beam

Robert Bradshaw Fri, 14 Aug 2020 20:52:47 -0700

On Fri, Aug 14, 2020 at 7:35 PM Alex Amato <ajam...@google.com> wrote:
>
> I am only tackling the specific metrics covered in (for the python SDK first, 
> then Java). To collect latency of IO API RPCS, and store it in a histogram.
> https://s.apache.org/beam-gcp-debuggability
>
> User histogram metrics are unfunded, as far as I know. But you should be able 
> to extend what I do for that project to the user metric use case. I agree, it 
> won't be much more work to support that. I designed the histogram with the 
> user histogram case in mind.


>From the portability point of view, all metrics generated in users
code (and SDK-side IOs are "user code") are user metrics. But
regardless of how things are named, once we have histogram metrics
crossing the FnAPI boundary all the infrastructure will be in place.
(At least the plan as I understand it shouldn't use private APIs
accessible only by the various IOs but not other SDK-level code.)

> On Fri, Aug 14, 2020 at 5:47 PM Robert Bradshaw <rober...@google.com> wrote:
>>
>> Once histograms are implemented in the SDK(s) (Alex, you're tackling
>> this, right?) it shoudn't be much work to update the Samza worker code
>> to publish these via the Samza runner APIs (in parallel with Alex's
>> work to do the same on Dataflow).
>>
>> On Fri, Aug 14, 2020 at 5:35 PM Alex Amato <ajam...@google.com> wrote:
>> >
>> > Noone has any plans currently to work on adding a generic histogram 
>> > metric, at the moment.
>> >
>> > But I will be actively working on adding it for a specific set of metrics 
>> > in the next quarter or so
>> > https://s.apache.org/beam-gcp-debuggability
>> >
>> > After that work, one could take a look at my PRs for reference to create 
>> > new metrics using the same histogram. One may wish to implement the 
>> > UserHistogram use case and use that in the Samza Runner
>> >
>> >
>> >
>> >
>> > On Fri, Aug 14, 2020 at 5:25 PM Ke Wu <ke.wu...@gmail.com> wrote:
>> >>
>> >> Thank you Robert and Alex. I am not running a Beam job in Google Cloud 
>> >> but with Samza Runner, so I am wondering if there is any ETA to add the 
>> >> Histogram metrics in Metrics class so it can be mapped to the 
>> >> SamzaHistogram metric to the actual emitting.
>> >>
>> >> Best,
>> >> Ke
>> >>
>> >> On Aug 14, 2020, at 4:44 PM, Alex Amato <ajam...@google.com> wrote:
>> >>
>> >> One of the plans to use the histogram data is to send it to Google 
>> >> Monitoring to compute estimates of percentiles. This is done using the 
>> >> bucket counts and bucket boundaries.
>> >>
>> >> Here is a describing of roughly how its calculated.
>> >> https://stackoverflow.com/questions/59635115/gcp-console-how-are-percentile-charts-calculated
>> >> This is a non exact estimate. But plotting the estimated percentiles over 
>> >> time is often easier to understand and sufficient.
>> >> (An alternative is a heatmap chart representing histograms over time. 
>> >> I.e. a histogram for each window of time).
>> >>
>> >>
>> >> On Fri, Aug 14, 2020 at 4:16 PM Robert Bradshaw <rober...@google.com> 
>> >> wrote:
>> >>>
>> >>> You may be interested in the propose histogram metrics:
>> >>> https://docs.google.com/document/d/1kiNG2BAR-51pRdBCK4-XFmc0WuIkSuBzeb__Zv8owbU/edit
>> >>>
>> >>> I think it'd be reasonable to add percentiles as its own metric type
>> >>> as well. The tricky bit (though there are lots of resources on this)
>> >>> is that one would have to publish more than just the percentiles from
>> >>> each worker to be able to compute the final percentiles across all
>> >>> workers.
>> >>>
>> >>> On Fri, Aug 14, 2020 at 4:05 PM Ke Wu <ke.wu...@gmail.com> wrote:
>> >>> >
>> >>> > Hi everyone,
>> >>> >
>> >>> > I am looking to add percentile metrics (p50, p90 etc) to my beam job 
>> >>> > but I only find Counter, Gauge and Distribution metrics. I understand 
>> >>> > that I can calculate percentile metrics in my job itself and use Gauge 
>> >>> > to emit, however this is not an easy approach. On the other hand, 
>> >>> > Distribution metrics sounds like the one to go to according to its 
>> >>> > documentation: "A metric that reports information about the 
>> >>> > distribution of reported values.”, however it seems that it is 
>> >>> > intended for SUM, COUNT, MIN, MAX.
>> >>> >
>> >>> > The question(s) are:
>> >>> >
>> >>> > 1. is Distribution metric only intended for sum, count, min, max?
>> >>> > 2. If Yes, can the documentation be updated to be more specific?
>> >>> > 3. Can we add percentiles metric support, such as Histogram, with 
>> >>> > configurable list of percentiles to emit?
>> >>> >
>> >>> > Best,
>> >>> > Ke
>> >>
>> >>

Re: Percentile metrics in Beam

Reply via email to