+1 to making the IO metrics (e.g. producers, consumers) available as part
of the Beam pipeline metrics tree for debugging and visibility.

As it has already been mentioned, many IO clients have a metrics mechanism
in place, so in these cases I think it could be beneficial to mirror their
metrics under the relevant subtree of the Beam metrics tree.

On Wed, Feb 15, 2017 at 12:04 AM Amit Sela <[email protected]> wrote:

> I think this is a great discussion and I'd like to relate to some of the
> points raised here, and raise some of my own.
>
> First of all I think we should be careful here not to cross boundaries. IOs
> naturally have many metrics, and Beam should avoid "taking over" those. IO
> metrics should focus on what's relevant to the Pipeline: input/output rate,
> backlog (for UnboundedSources, which exists in bytes but for monitoring
> purposes we might want to consider #messages).
>
> I don't agree that we should not invest in doing this in Sources/Sinks and
> going directly to SplittableDoFn because the IO API is familiar and known,
> and as long as we keep it should be treated as a first class citizen.
>
> As for enable/disable - if IOs consider focusing on pipeline-related
> metrics I think we should be fine, though this could also change between
> runners as well.
>
> Finally, considering "split-metrics" is interesting because on one hand it
> affects the pipeline directly (unbalanced partitions in Kafka that may
> cause backlog) but this is that fine-line of responsibilities (Kafka
> monitoring would probably be able to tell you that partitions are not
> balanced).
>
> My 2 cents, cheers!
>
> On Tue, Feb 14, 2017 at 8:46 PM Raghu Angadi <[email protected]>
> wrote:
>
> > On Tue, Feb 14, 2017 at 9:21 AM, Ben Chambers
> <[email protected]
> > >
> > wrote:
> >
> > >
> > > > * I also think there are data source specific metrics that a given IO
> > > will
> > > > want to expose (ie, things like kafka backlog for a topic.)
> >
> >
> > UnboundedSource has API for backlog. It is better for beam/runners to
> > handle backlog as well.
> > Of course there will be some source specific metrics too (errors, i/o ops
> > etc).
> >
>

Reply via email to