I am only tackling the specific metrics covered in (for the python SDK first, then Java). To collect latency of IO API RPCS, and store it in a histogram. https://s.apache.org/beam-gcp-debuggability
User histogram metrics are unfunded, as far as I know. But you should be able to extend what I do for that project to the user metric use case. I agree, it won't be much more work to support that. I designed the histogram with the user histogram case in mind. On Fri, Aug 14, 2020 at 5:47 PM Robert Bradshaw <[email protected]> wrote: > Once histograms are implemented in the SDK(s) (Alex, you're tackling > this, right?) it shoudn't be much work to update the Samza worker code > to publish these via the Samza runner APIs (in parallel with Alex's > work to do the same on Dataflow). > > On Fri, Aug 14, 2020 at 5:35 PM Alex Amato <[email protected]> wrote: > > > > Noone has any plans currently to work on adding a generic histogram > metric, at the moment. > > > > But I will be actively working on adding it for a specific set of > metrics in the next quarter or so > > https://s.apache.org/beam-gcp-debuggability > > > > After that work, one could take a look at my PRs for reference to create > new metrics using the same histogram. One may wish to implement the > UserHistogram use case and use that in the Samza Runner > > > > > > > > > > On Fri, Aug 14, 2020 at 5:25 PM Ke Wu <[email protected]> wrote: > >> > >> Thank you Robert and Alex. I am not running a Beam job in Google Cloud > but with Samza Runner, so I am wondering if there is any ETA to add the > Histogram metrics in Metrics class so it can be mapped to the > SamzaHistogram metric to the actual emitting. > >> > >> Best, > >> Ke > >> > >> On Aug 14, 2020, at 4:44 PM, Alex Amato <[email protected]> wrote: > >> > >> One of the plans to use the histogram data is to send it to Google > Monitoring to compute estimates of percentiles. This is done using the > bucket counts and bucket boundaries. > >> > >> Here is a describing of roughly how its calculated. > >> > https://stackoverflow.com/questions/59635115/gcp-console-how-are-percentile-charts-calculated > >> This is a non exact estimate. But plotting the estimated percentiles > over time is often easier to understand and sufficient. > >> (An alternative is a heatmap chart representing histograms over time. > I.e. a histogram for each window of time). > >> > >> > >> On Fri, Aug 14, 2020 at 4:16 PM Robert Bradshaw <[email protected]> > wrote: > >>> > >>> You may be interested in the propose histogram metrics: > >>> > https://docs.google.com/document/d/1kiNG2BAR-51pRdBCK4-XFmc0WuIkSuBzeb__Zv8owbU/edit > >>> > >>> I think it'd be reasonable to add percentiles as its own metric type > >>> as well. The tricky bit (though there are lots of resources on this) > >>> is that one would have to publish more than just the percentiles from > >>> each worker to be able to compute the final percentiles across all > >>> workers. > >>> > >>> On Fri, Aug 14, 2020 at 4:05 PM Ke Wu <[email protected]> wrote: > >>> > > >>> > Hi everyone, > >>> > > >>> > I am looking to add percentile metrics (p50, p90 etc) to my beam job > but I only find Counter, Gauge and Distribution metrics. I understand that > I can calculate percentile metrics in my job itself and use Gauge to emit, > however this is not an easy approach. On the other hand, Distribution > metrics sounds like the one to go to according to its documentation: "A > metric that reports information about the distribution of reported > values.”, however it seems that it is intended for SUM, COUNT, MIN, MAX. > >>> > > >>> > The question(s) are: > >>> > > >>> > 1. is Distribution metric only intended for sum, count, min, max? > >>> > 2. If Yes, can the documentation be updated to be more specific? > >>> > 3. Can we add percentiles metric support, such as Histogram, with > configurable list of percentiles to emit? > >>> > > >>> > Best, > >>> > Ke > >> > >> >
