I can definitely share a PR with my changes for the Distribution metric soon.
But there is something that I wanted to discuss first. As per the original design doc for metrics api, http://s.apache.org/beam-metrics-api, it seems like the Distribution metric interface was only intended to have a single method to update the distribution- *update(long value)*. This made the Distribution Metric easily extensible. The current Distribution metric interface [1], however, exposes another method- *void update(long sum, long count, long min, long max).* This call in turn calls the update method in the DistributionCell[2] which directly sets the values in the DistributionData[3] class. DistributionData would contain the logic to compute percentiles but the other *update *method in the user facing Distribution api makes it hard to add anything new. Our user's mostly use the *update(value)* method to update the distribution over updating the individual values- sum, min, max... 1. https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/Distribution.java 2. https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/DistributionCell.java#L65 3. https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/DistributionData.java#L46 On Fri, Sep 17, 2021 at 11:33 AM Ke Wu <ke.wu...@gmail.com> wrote: > +1 would love to see a PR/Proposal out. > > This is a highly demanding feature our users at LinkedIn are asking for as > well. > > On Sep 17, 2021, at 10:56, Pablo Estrada <pabl...@google.com> wrote: > > > Thanks for working on this! > In the past, we have avoided adding complex metrics because metrics tend > to be aggregated in the control path rather than the data path - and we > worried about overwhelming the metrics backends - however users have in the > past asked for more information in the distribution metric itself. I think > it makes sense to provide more info, while allowing runners to report as > much as they see fit. I'd love to see a proposal / PR for this. > > fyi @Robert Bradshaw <rober...@google.com> > > On Wed, Sep 15, 2021 at 10:37 AM Ajo Thomas <ajo.thoma...@gmail.com> > wrote: > >> Thanks for the response, Alexey and Ke. >> Agree with your point to introduce a new metric type (say Percentiles) >> instead of altering the Distribution metric type to ensure compatibility >> across runners and sdks. >> I am currently working on a prototype to add this new metric type to the >> metrics API and testing it with samza runner. I can share a design doc with >> the community with possible solutions very soon. >> >> Thanks >> Ajo >> >> On Wed, Sep 15, 2021 at 9:26 AM Alexey Romanenko < >> aromanenko....@gmail.com> wrote: >> >>> I agree with Ke Wu in the way that we need to keep compatibility across >>> all runners and the same metrics. So, it seems that it would be better to >>> create another metric type in this case. >>> >>> Also, to discuss it in details, I’d recommend to create a design >>> document with possible solutions and examples. >>> >>> — >>> Alexey >>> >>> On 14 Sep 2021, at 19:04, Ke Wu <ke.wu...@gmail.com> wrote: >>> >>> I prefer adding a new metrics type instead of enhancing the existing >>> Distribution [1] to support percentiles etc in order to ensure better >>> compatibility. >>> >>> @Luke @Kyle what are your thoughts on this? >>> >>> Best, >>> Ke >>> >>> [1] >>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/Distribution.java >>> >>> >>> On Sep 7, 2021, at 1:28 PM, Ajo Thomas <ajo.thoma...@gmail.com> wrote: >>> >>> Hi All, >>> >>> I am working on adding support for some additional distribution metrics >>> like std dev, percentiles to the Metrics API. The runner of interest here >>> is Samza runner. I wanted to get the opinion of fellow beam devs on this. >>> >>> One way to do this would be to make changes to the existing Distribution >>> metric: >>> - Add additional metrics to Distribution metric- custom percentiles, std >>> dev, mean. Use Dropwizard Histogram under the hood in DistributionData to >>> track the distribution of the data. >>> - This also means changes to accompanying classes like DistributionData, >>> DistributionResult which might involve runner specific changes. >>> >>> Is this an acceptable change or would you suggest something else? Is the >>> Distribution metric only intended to track the metrics that it is currently >>> tracking- sum, min, max, count? >>> >>> Thanks >>> Ajo >>> >>> >>> >>>