I can definitely share a PR with my changes for the Distribution metric
soon.

But there is something that I wanted to discuss first. As per the original
design doc for metrics api, http://s.apache.org/beam-metrics-api, it seems
like the Distribution metric interface was only intended to have a single
method to update the distribution- *update(long value)*. This made the
Distribution Metric easily extensible.
The current Distribution metric interface [1], however, exposes another
method- *void update(long sum, long count, long min, long max).* This call
in turn calls the update method in the DistributionCell[2] which directly
sets the values in the DistributionData[3] class.
DistributionData would contain the logic to compute percentiles but the
other *update *method in the user facing Distribution api makes it hard to
add anything new. Our user's mostly use the *update(value)* method to
update the distribution over updating the individual values- sum, min,
max...

1.
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/Distribution.java
2.
https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/DistributionCell.java#L65
3.
https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/DistributionData.java#L46

On Fri, Sep 17, 2021 at 11:33 AM Ke Wu <ke.wu...@gmail.com> wrote:

> +1 would love to see a PR/Proposal out.
>
> This is a highly demanding feature our users at LinkedIn are asking for as
> well.
>
> On Sep 17, 2021, at 10:56, Pablo Estrada <pabl...@google.com> wrote:
>
> 
> Thanks for working on this!
> In the past, we have avoided adding complex metrics because metrics tend
> to be aggregated in the control path rather than the data path - and we
> worried about overwhelming the metrics backends - however users have in the
> past asked for more information in the distribution metric itself. I think
> it makes sense to provide more info, while allowing runners to report as
> much as they see fit. I'd love to see a proposal / PR for this.
>
> fyi @Robert Bradshaw <rober...@google.com>
>
> On Wed, Sep 15, 2021 at 10:37 AM Ajo Thomas <ajo.thoma...@gmail.com>
> wrote:
>
>> Thanks for the response, Alexey and Ke.
>> Agree with your point to introduce a new metric type (say Percentiles)
>> instead of altering the Distribution metric type to ensure compatibility
>> across runners and sdks.
>> I am currently working on a prototype to add this new metric type to the
>> metrics API and testing it with samza runner. I can share a design doc with
>> the community with possible solutions very soon.
>>
>> Thanks
>> Ajo
>>
>> On Wed, Sep 15, 2021 at 9:26 AM Alexey Romanenko <
>> aromanenko....@gmail.com> wrote:
>>
>>> I agree with Ke Wu in the way that we need to keep compatibility across
>>> all runners and the same metrics. So, it seems that it would be better to
>>> create another metric type in this case.
>>>
>>> Also, to discuss it in details, I’d recommend to create a design
>>> document with possible solutions and examples.
>>>
>>> —
>>> Alexey
>>>
>>> On 14 Sep 2021, at 19:04, Ke Wu <ke.wu...@gmail.com> wrote:
>>>
>>> I prefer adding a new metrics type instead of enhancing the existing
>>> Distribution [1] to support percentiles etc in order to ensure better
>>> compatibility.
>>>
>>> @Luke @Kyle what are your thoughts on this?
>>>
>>> Best,
>>> Ke
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/Distribution.java
>>>
>>>
>>> On Sep 7, 2021, at 1:28 PM, Ajo Thomas <ajo.thoma...@gmail.com> wrote:
>>>
>>> Hi All,
>>>
>>> I am working on adding support for some additional distribution metrics
>>> like std dev, percentiles to the Metrics API. The runner of interest here
>>> is Samza runner. I wanted to get the opinion of fellow beam devs on this.
>>>
>>> One way to do this would be to make changes to the existing Distribution
>>> metric:
>>> - Add additional metrics to Distribution metric- custom percentiles, std
>>> dev, mean. Use Dropwizard Histogram under the hood in DistributionData to
>>> track the distribution of the data.
>>> - This also means changes to accompanying classes like DistributionData,
>>> DistributionResult which might involve runner specific changes.
>>>
>>> Is this an acceptable change or would you suggest something else? Is the
>>> Distribution metric only intended to track the metrics that it is currently
>>> tracking- sum, min, max, count?
>>>
>>> Thanks
>>> Ajo
>>>
>>>
>>>
>>>

Reply via email to