FYI, the design linked has already been implemented. Histogram is already implemented within Beam as a Distribution.
Histograms represent a set of frequencies whereas a Gauge usually represents a single value. The Gauge API however allows for a single instance of any type to be reported wherein Beam's flink runner encapsulates the DistributionResult. I assume there was a reason for this implementation as opposed to a histogram. It does appear that the flink runner implementation precomputes the statistics usually associated with histogram (eg mean, min, max, sum) whereas most metrics API's (eg dropwizard) preserve the underlying values that form the histogram. Flink's metrics api does this in that the histogram has a 'reservoir' and the statistics are computed at reporting time. Id first like to understand why Beam's flink runner implementation went this route and then if possible explore the straightforward alternative. Our use case is such that we utilize Flink's dropwizard reporter and the type mismatch between the Beam flink runner's Distribution (wrapped in a Gauge) causes problems for us internally. On 2020/11/19 20:58:26, Kyle Weaver <[email protected]> wrote: > What are the advantages of using a Histogram instead of a Gauge?> > > Also, check out this design doc for adding histogram metrics to Beam if you> > haven't already: http://s.apache.org/beam-metrics-api (Not sure what the> > current status is.)> > > On Wed, Nov 18, 2020 at 1:37 PM Richard Moorhead <[email protected]>> > wrote:> > > > Beam's DistributionResult is implemented as a Gauge within the Flink> > > runner. Can someone explain the rationale behind this? Would a PR to> > > utilize a Histogram be acceptable?> > >> >
