Hi Ismaël, You've raised some great points. Please see my comments inline.
On Tue, Feb 14, 2017 at 3:37 PM Ismaël Mejía <[email protected]> wrote: > Hello, > > The new metrics API allows us to integrate some basic metrics into the Beam > IOs. I have been following some discussions about this on JIRAs/PRs, and I > think it is important to discuss the subject here so we can have more > awareness and obtain ideas from the community. > > First I want to thank Ben for his work on the metrics API, and Aviem for > his ongoing work on metrics for IOs, e.g. KafkaIO) that made me aware of > this subject. > > There are some basic ideas to discuss e.g. > > - What are the responsibilities of Beam IOs in terms of Metrics > (considering the fact that the actual IOs, server + client, usually provide > their own)? > While it is true that many IOs provide their own metrics, I think that Beam should expose IO metrics because: 1. Metrics which help understanding performance of a pipeline which uses an IO may not be covered by the IO . 2. Users may not be able to setup integrations with the IO's metrics to view them effectively (And correlate them to a specific Beam pipeline), but still want to investigate their pipeline's performance. > - What metrics are relevant to the pipeline (or some particular IOs)? Kafka > backlog for one could point that a pipeline is behind ingestion rate. I think it depends on the IO, but there is probably overlap in some of the metrics so a guideline might be written for this. I listed what I thought should be reported for KafkaIO in the following JIRA: https://issues.apache.org/jira/browse/BEAM-1398 Feel free to add more metrics you think are important to report. > > - Should metrics be calculated on IOs by default or no? > - If metrics are defined by default does it make sense to allow users to > disable them? > IIUC, your concern is that metrics will add overhead to the pipeline, and pipelines which are highly sensitive to this will be hampered? In any case I think that yes, metrics calculation should be configurable (Enable/disable). In Spark runner, for example the Metrics sink feature (not the metrics calculation itself, but sinks to send them to) is configurable in the pipeline options. > Well these are just some questions around the subject so we can create a > common set of practices to include metrics in the IOs and eventually > improve the transform guide with this. What do you think about this? Do you > have other questions/ideas? > > Thanks, > Ismaël >
