The distinction here is that even though these metrics come from user space, we still gave them specific URNs, which imply they have a specific format, with specific labels, etc.
That is, we won't be packaging them into a USER_HISTOGRAM urn. That URN would have less expectation for its format. Today the USER_COUNTER just expects like labels (TRANSFORM, NAME, NAMESPACE). We didn't decide on making a private API. But rather an API available to user code for populating metrics with specific labels, and specific URNs. The same API could pretty much be used for user USER_HISTOGRAM. with a default URN chosen. Thats how I see it in my head at the moment. On Fri, Aug 14, 2020 at 8:52 PM Robert Bradshaw <rober...@google.com> wrote: > On Fri, Aug 14, 2020 at 7:35 PM Alex Amato <ajam...@google.com> wrote: > > > > I am only tackling the specific metrics covered in (for the python SDK > first, then Java). To collect latency of IO API RPCS, and store it in a > histogram. > > https://s.apache.org/beam-gcp-debuggability > > > > User histogram metrics are unfunded, as far as I know. But you should be > able to extend what I do for that project to the user metric use case. I > agree, it won't be much more work to support that. I designed the histogram > with the user histogram case in mind. > > From the portability point of view, all metrics generated in users > code (and SDK-side IOs are "user code") are user metrics. But > regardless of how things are named, once we have histogram metrics > crossing the FnAPI boundary all the infrastructure will be in place. > (At least the plan as I understand it shouldn't use private APIs > accessible only by the various IOs but not other SDK-level code.) > > > On Fri, Aug 14, 2020 at 5:47 PM Robert Bradshaw <rober...@google.com> > wrote: > >> > >> Once histograms are implemented in the SDK(s) (Alex, you're tackling > >> this, right?) it shoudn't be much work to update the Samza worker code > >> to publish these via the Samza runner APIs (in parallel with Alex's > >> work to do the same on Dataflow). > >> > >> On Fri, Aug 14, 2020 at 5:35 PM Alex Amato <ajam...@google.com> wrote: > >> > > >> > Noone has any plans currently to work on adding a generic histogram > metric, at the moment. > >> > > >> > But I will be actively working on adding it for a specific set of > metrics in the next quarter or so > >> > https://s.apache.org/beam-gcp-debuggability > >> > > >> > After that work, one could take a look at my PRs for reference to > create new metrics using the same histogram. One may wish to implement the > UserHistogram use case and use that in the Samza Runner > >> > > >> > > >> > > >> > > >> > On Fri, Aug 14, 2020 at 5:25 PM Ke Wu <ke.wu...@gmail.com> wrote: > >> >> > >> >> Thank you Robert and Alex. I am not running a Beam job in Google > Cloud but with Samza Runner, so I am wondering if there is any ETA to add > the Histogram metrics in Metrics class so it can be mapped to the > SamzaHistogram metric to the actual emitting. > >> >> > >> >> Best, > >> >> Ke > >> >> > >> >> On Aug 14, 2020, at 4:44 PM, Alex Amato <ajam...@google.com> wrote: > >> >> > >> >> One of the plans to use the histogram data is to send it to Google > Monitoring to compute estimates of percentiles. This is done using the > bucket counts and bucket boundaries. > >> >> > >> >> Here is a describing of roughly how its calculated. > >> >> > https://stackoverflow.com/questions/59635115/gcp-console-how-are-percentile-charts-calculated > >> >> This is a non exact estimate. But plotting the estimated percentiles > over time is often easier to understand and sufficient. > >> >> (An alternative is a heatmap chart representing histograms over > time. I.e. a histogram for each window of time). > >> >> > >> >> > >> >> On Fri, Aug 14, 2020 at 4:16 PM Robert Bradshaw <rober...@google.com> > wrote: > >> >>> > >> >>> You may be interested in the propose histogram metrics: > >> >>> > https://docs.google.com/document/d/1kiNG2BAR-51pRdBCK4-XFmc0WuIkSuBzeb__Zv8owbU/edit > >> >>> > >> >>> I think it'd be reasonable to add percentiles as its own metric type > >> >>> as well. The tricky bit (though there are lots of resources on this) > >> >>> is that one would have to publish more than just the percentiles > from > >> >>> each worker to be able to compute the final percentiles across all > >> >>> workers. > >> >>> > >> >>> On Fri, Aug 14, 2020 at 4:05 PM Ke Wu <ke.wu...@gmail.com> wrote: > >> >>> > > >> >>> > Hi everyone, > >> >>> > > >> >>> > I am looking to add percentile metrics (p50, p90 etc) to my beam > job but I only find Counter, Gauge and Distribution metrics. I understand > that I can calculate percentile metrics in my job itself and use Gauge to > emit, however this is not an easy approach. On the other hand, Distribution > metrics sounds like the one to go to according to its documentation: "A > metric that reports information about the distribution of reported > values.”, however it seems that it is intended for SUM, COUNT, MIN, MAX. > >> >>> > > >> >>> > The question(s) are: > >> >>> > > >> >>> > 1. is Distribution metric only intended for sum, count, min, max? > >> >>> > 2. If Yes, can the documentation be updated to be more specific? > >> >>> > 3. Can we add percentiles metric support, such as Histogram, with > configurable list of percentiles to emit? > >> >>> > > >> >>> > Best, > >> >>> > Ke > >> >> > >> >> >