Re: Percentile metrics in Beam

Gleb Kanterov Mon, 17 Aug 2020 03:42:59 -0700

Didn't see proposal by Alex before today. I want to add a few more cents
from my side.


There is a paper Moment-based quantile sketches for efficient high
cardinality aggregation queries [1], a TL;DR that for some N (around 10-20
depending on accuracy) we need to collect SUM(log^N(X)) ... log(X),
COUNT(X), SUM(X), SUM(X^2)... SUM(X^N), MAX(X), MIN(X). Given aggregated
numbers, it uses solver for Chebyshev polynomials to get quantile number,
and there is already Java implementation for it on GitHub [2].

This way we can express quantiles using existing metric types in Beam, that
can be already done without SDK or runner changes. It can fit nicely into
existing runners and can be abstracted over if needed. I think this is also
one of the best implementations, it has < 1% error rate for 200 bytes of
storage, and quite efficient to compute. Did we consider using that?

[1]:
https://blog.acolyer.org/2018/10/31/moment-based-quantile-sketches-for-efficient-high-cardinality-aggregation-queries/
[2]: https://github.com/stanford-futuredata/msketch

On Sat, Aug 15, 2020 at 6:15 AM Alex Amato <[email protected]> wrote:

> The distinction here is that even though these metrics come from user
> space, we still gave them specific URNs, which imply they have a specific
> format, with specific labels, etc.
>
> That is, we won't be packaging them into a USER_HISTOGRAM urn. That URN
> would have less expectation for its format. Today the USER_COUNTER just
> expects like labels (TRANSFORM, NAME, NAMESPACE).
>
> We didn't decide on making a private API. But rather an API available to
> user code for populating metrics with specific labels, and specific URNs.
> The same API could pretty much be used for user USER_HISTOGRAM. with a
> default URN chosen.
> Thats how I see it in my head at the moment.
>
>
> On Fri, Aug 14, 2020 at 8:52 PM Robert Bradshaw <[email protected]>
> wrote:
>
>> On Fri, Aug 14, 2020 at 7:35 PM Alex Amato <[email protected]> wrote:
>> >
>> > I am only tackling the specific metrics covered in (for the python SDK
>> first, then Java). To collect latency of IO API RPCS, and store it in a
>> histogram.
>> > https://s.apache.org/beam-gcp-debuggability
>> >
>> > User histogram metrics are unfunded, as far as I know. But you should
>> be able to extend what I do for that project to the user metric use case. I
>> agree, it won't be much more work to support that. I designed the histogram
>> with the user histogram case in mind.
>>
>> From the portability point of view, all metrics generated in users
>> code (and SDK-side IOs are "user code") are user metrics. But
>> regardless of how things are named, once we have histogram metrics
>> crossing the FnAPI boundary all the infrastructure will be in place.
>> (At least the plan as I understand it shouldn't use private APIs
>> accessible only by the various IOs but not other SDK-level code.)
>>
>> > On Fri, Aug 14, 2020 at 5:47 PM Robert Bradshaw <[email protected]>
>> wrote:
>> >>
>> >> Once histograms are implemented in the SDK(s) (Alex, you're tackling
>> >> this, right?) it shoudn't be much work to update the Samza worker code
>> >> to publish these via the Samza runner APIs (in parallel with Alex's
>> >> work to do the same on Dataflow).
>> >>
>> >> On Fri, Aug 14, 2020 at 5:35 PM Alex Amato <[email protected]> wrote:
>> >> >
>> >> > Noone has any plans currently to work on adding a generic histogram
>> metric, at the moment.
>> >> >
>> >> > But I will be actively working on adding it for a specific set of
>> metrics in the next quarter or so
>> >> > https://s.apache.org/beam-gcp-debuggability
>> >> >
>> >> > After that work, one could take a look at my PRs for reference to
>> create new metrics using the same histogram. One may wish to implement the
>> UserHistogram use case and use that in the Samza Runner
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Fri, Aug 14, 2020 at 5:25 PM Ke Wu <[email protected]> wrote:
>> >> >>
>> >> >> Thank you Robert and Alex. I am not running a Beam job in Google
>> Cloud but with Samza Runner, so I am wondering if there is any ETA to add
>> the Histogram metrics in Metrics class so it can be mapped to the
>> SamzaHistogram metric to the actual emitting.
>> >> >>
>> >> >> Best,
>> >> >> Ke
>> >> >>
>> >> >> On Aug 14, 2020, at 4:44 PM, Alex Amato <[email protected]> wrote:
>> >> >>
>> >> >> One of the plans to use the histogram data is to send it to Google
>> Monitoring to compute estimates of percentiles. This is done using the
>> bucket counts and bucket boundaries.
>> >> >>
>> >> >> Here is a describing of roughly how its calculated.
>> >> >>
>> https://stackoverflow.com/questions/59635115/gcp-console-how-are-percentile-charts-calculated
>> >> >> This is a non exact estimate. But plotting the estimated
>> percentiles over time is often easier to understand and sufficient.
>> >> >> (An alternative is a heatmap chart representing histograms over
>> time. I.e. a histogram for each window of time).
>> >> >>
>> >> >>
>> >> >> On Fri, Aug 14, 2020 at 4:16 PM Robert Bradshaw <
>> [email protected]> wrote:
>> >> >>>
>> >> >>> You may be interested in the propose histogram metrics:
>> >> >>>
>> https://docs.google.com/document/d/1kiNG2BAR-51pRdBCK4-XFmc0WuIkSuBzeb__Zv8owbU/edit
>> >> >>>
>> >> >>> I think it'd be reasonable to add percentiles as its own metric
>> type
>> >> >>> as well. The tricky bit (though there are lots of resources on
>> this)
>> >> >>> is that one would have to publish more than just the percentiles
>> from
>> >> >>> each worker to be able to compute the final percentiles across all
>> >> >>> workers.
>> >> >>>
>> >> >>> On Fri, Aug 14, 2020 at 4:05 PM Ke Wu <[email protected]> wrote:
>> >> >>> >
>> >> >>> > Hi everyone,
>> >> >>> >
>> >> >>> > I am looking to add percentile metrics (p50, p90 etc) to my beam
>> job but I only find Counter, Gauge and Distribution metrics. I understand
>> that I can calculate percentile metrics in my job itself and use Gauge to
>> emit, however this is not an easy approach. On the other hand, Distribution
>> metrics sounds like the one to go to according to its documentation: "A
>> metric that reports information about the distribution of reported
>> values.”, however it seems that it is intended for SUM, COUNT, MIN, MAX.
>> >> >>> >
>> >> >>> > The question(s) are:
>> >> >>> >
>> >> >>> > 1. is Distribution metric only intended for sum, count, min, max?
>> >> >>> > 2. If Yes, can the documentation be updated to be more specific?
>> >> >>> > 3. Can we add percentiles metric support, such as Histogram,
>> with configurable list of percentiles to emit?
>> >> >>> >
>> >> >>> > Best,
>> >> >>> > Ke
>> >> >>
>> >> >>
>>
>

Re: Percentile metrics in Beam

Reply via email to