Re: beam flink-runner distribution implementation

Richard Moorhead Thu, 19 Nov 2020 17:48:12 -0800

FYI, the design linked has already been implemented. Histogram is already
implemented within Beam as a Distribution.

Histograms represent a set of frequencies whereas a Gauge usually
represents a single value.

The Gauge API however allows for a single instance of any type to be
reported wherein Beam's flink runner encapsulates the DistributionResult.

I assume there was a reason for this implementation as opposed to a
histogram. It does appear that the flink runner implementation precomputes
the statistics usually associated with histogram (eg mean, min, max, sum)
whereas most metrics API's (eg dropwizard) preserve the underlying values
that form the histogram. Flink's metrics api does this in that the
histogram has a 'reservoir' and the statistics are computed at reporting
time.

Id first like to understand why Beam's flink runner implementation went
this route and then if possible explore the straightforward alternative.

Our use case is such that we utilize Flink's dropwizard reporter and the
type mismatch between the Beam flink runner's Distribution (wrapped in a
Gauge) causes problems for us internally.

On 2020/11/19 20:58:26, Kyle Weaver <[email protected]> wrote:
> What are the advantages of using a Histogram instead of a Gauge?>
>
> Also, check out this design doc for adding histogram metrics to Beam if
you>
> haven't already: http://s.apache.org/beam-metrics-api (Not sure what the>
> current status is.)>
>
> On Wed, Nov 18, 2020 at 1:37 PM Richard Moorhead <[email protected]>>
> wrote:>
>
> > Beam's DistributionResult is implemented as a Gauge within the Flink>
> > runner. Can someone explain the rationale behind this? Would a PR to>
> > utilize a Histogram be acceptable?>
> >>
>

Re: beam flink-runner distribution implementation

Reply via email to