On Wed, Nov 13, 2019 at 10:56 AM Maximilian Michels <m...@apache.org> wrote:
>
> > Are you referring specifically to?
> > * beam:metric:element_count:v1
> > * beam:metric:pardo_execution_time:start_bundle_msecs:v1
> > * beam:metric:pardo_execution_time:process_bundle_msecs:v1
> > * beam:metric:pardo_execution_time:finish_bundle_msecs:v1
> > * beam:metric:ptransform_execution_time:total_msecs:v1
>
> Yes.
>
> > Would the gauge be grouped per element or per bundle?
>
> Per bundle. These are reported when the bundle finishes.
>
> > If grouped at the bundle level the metrics are arbitrary to the user since 
> > the bundle size is chosen by the runner.
>
> Not necessarily because the bundle size is typically fixed (at least in
> the Flink Runner). In any case, it provides information about how much
> activity occurred in a bundle which is useful to know.
>
> > There is also a very significant overhead for tracking low level metrics
>
> I can't imagine tracking a per-bundle element count or execution time is
> that expensive. Maybe I'm wrong.

These are element counts and execution time per operation (e.g. per
DoFn). FWIW, process_bundle_msecs is mis-named, it should be
"process_element" or just "process" as it refers to the time spend in
that method. beam:metric:ptransform_execution_time:total_msecs:v1
seems redundant with the sum of the others. (Unless it includes
setup/teardown, which it seems are missing as separate values?)

I think what you want is new metrics associated with the bundle +
executable stage as a whole. Distribution metrics would make the most
sense here. (Gauge metrics would just report the value of whatever
bundle finished last...) I don't know how they'd be named, perhaps
they'd be labeled with the full set of transforms that the stage
contains (which is of course not stable)?

> On 13.11.19 18:58, Luke Cwik wrote:
> > Are you referring specifically to?
> > * beam:metric:element_count:v1
> > * beam:metric:pardo_execution_time:start_bundle_msecs:v1
> > * beam:metric:pardo_execution_time:process_bundle_msecs:v1
> > * beam:metric:pardo_execution_time:finish_bundle_msecs:v1
> > * beam:metric:ptransform_execution_time:total_msecs:v1
> >
> > Would the gauge be grouped per element or per bundle?
> > If grouped at the bundle level the metrics are arbitrary to the user
> > since the bundle size is chosen by the runner.
> > If grouped at the element level then only a few of the metrics make sense:
> > * element_count becomes number of outputs per input element
> > * process_bundle_msecs becomes amount of time to process a single input
> > element (does this still apply to elements that can be split?)
> >
> > There is also a very significant overhead for tracking low level metrics
> > in great detail which is why timing is done through a sampling
> > technique. I'm sure if we could do it cheaply then it would make sense
> > to get those metrics. This is also a place where we want each SDK to
> > implement these metrics so complexity may slow down SDK authors from
> > developing them.
> >
> >
> > On Wed, Nov 13, 2019 at 5:13 AM Maximilian Michels <m...@apache.org
> > <mailto:m...@apache.org>> wrote:
> >
> >     Hi,
> >
> >     We have a series of builtin PTransform/PCollection metrics:
> >     
> > https://github.com/apache/beam/blob/808cb35018cd228a59b152234b655948da2455fa/model/pipeline/src/main/proto/metrics.proto#L74
> >
> >     Why are those of counters ("beam:metrics:sum_int_64")? I think the
> >     better default type for most users would be gauge
> >     ("beam:metrics:latest_int_64").
> >
> >     I understand that counters are useful because they retain the sum of
> >     all
> >     reported values, but for getting an idea about the deviation of a
> >     metric, gauges could be more useful.
> >
> >     Perhaps we could make this configurable?
> >
> >     Thanks,
> >     Max
> >

Reply via email to