I think this is a great discussion and I'd like to relate to some of the points raised here, and raise some of my own.
First of all I think we should be careful here not to cross boundaries. IOs naturally have many metrics, and Beam should avoid "taking over" those. IO metrics should focus on what's relevant to the Pipeline: input/output rate, backlog (for UnboundedSources, which exists in bytes but for monitoring purposes we might want to consider #messages). I don't agree that we should not invest in doing this in Sources/Sinks and going directly to SplittableDoFn because the IO API is familiar and known, and as long as we keep it should be treated as a first class citizen. As for enable/disable - if IOs consider focusing on pipeline-related metrics I think we should be fine, though this could also change between runners as well. Finally, considering "split-metrics" is interesting because on one hand it affects the pipeline directly (unbalanced partitions in Kafka that may cause backlog) but this is that fine-line of responsibilities (Kafka monitoring would probably be able to tell you that partitions are not balanced). My 2 cents, cheers! On Tue, Feb 14, 2017 at 8:46 PM Raghu Angadi <[email protected]> wrote: > On Tue, Feb 14, 2017 at 9:21 AM, Ben Chambers <[email protected] > > > wrote: > > > > > > * I also think there are data source specific metrics that a given IO > > will > > > want to expose (ie, things like kafka backlog for a topic.) > > > UnboundedSource has API for backlog. It is better for beam/runners to > handle backlog as well. > Of course there will be some source specific metrics too (errors, i/o ops > etc). >
