Hi Aviem Agree with your comments, it's pretty close to my previous ones.
Regards JB On Feb 14, 2017, 12:04, at 12:04, Aviem Zur <[email protected]> wrote: >Hi Ismaël, > >You've raised some great points. >Please see my comments inline. > >On Tue, Feb 14, 2017 at 3:37 PM Ismaël Mejía <[email protected]> wrote: > >> Hello, >> >> The new metrics API allows us to integrate some basic metrics into >the Beam >> IOs. I have been following some discussions about this on JIRAs/PRs, >and I >> think it is important to discuss the subject here so we can have more >> awareness and obtain ideas from the community. >> >> First I want to thank Ben for his work on the metrics API, and Aviem >for >> his ongoing work on metrics for IOs, e.g. KafkaIO) that made me aware >of >> this subject. >> >> There are some basic ideas to discuss e.g. >> >> - What are the responsibilities of Beam IOs in terms of Metrics >> (considering the fact that the actual IOs, server + client, usually >provide >> their own)? >> > >While it is true that many IOs provide their own metrics, I think that >Beam >should expose IO metrics because: > >1. Metrics which help understanding performance of a pipeline which >uses > an IO may not be covered by the IO . >2. Users may not be able to setup integrations with the IO's metrics to >view them effectively (And correlate them to a specific Beam pipeline), >but > still want to investigate their pipeline's performance. > > >> - What metrics are relevant to the pipeline (or some particular IOs)? >Kafka >> backlog for one could point that a pipeline is behind ingestion rate. > > >I think it depends on the IO, but there is probably overlap in some of >the >metrics so a guideline might be written for this. >I listed what I thought should be reported for KafkaIO in the following >JIRA: https://issues.apache.org/jira/browse/BEAM-1398 >Feel free to add more metrics you think are important to report. > > >> >> >- Should metrics be calculated on IOs by default or no? >> - If metrics are defined by default does it make sense to allow users >to >> disable them? >> > >IIUC, your concern is that metrics will add overhead to the pipeline, >and >pipelines which are highly sensitive to this will be hampered? >In any case I think that yes, metrics calculation should be >configurable >(Enable/disable). >In Spark runner, for example the Metrics sink feature (not the metrics >calculation itself, but sinks to send them to) is configurable in the >pipeline options. > > >> Well these are just some questions around the subject so we can >create a >> common set of practices to include metrics in the IOs and eventually >> improve the transform guide with this. What do you think about this? >Do you >> have other questions/ideas? >> >> Thanks, >> Ismaël >>
