Thanks for starting this conversation Ismael! I too have been thinking
we'll need some general approach to metrics for IO in the near future.

Two general thoughts:

1. Before making the metrics configurable, I think it would be worthwhile
to see if we can find the right set of metrics that provide useful
information about IO without affecting performance and have these always
on. Monitoring information like this is often useful when a pipeline is
behaving unexpectedly, and predicting when that will happen and turning on
the metrics is problematic.

2. I think focusing on metrics about source splitting and such is the wrong
level from a user perspective. A user shouldn't need to understand how
sources split and what that means. Instead, we should report higher-level
metrics such as how many bytes of input have been processed, how many bytes
remain (if that is known), etc.

Ideally, metrics about splitting can be reported by the runner in a general
manner. If they're useful for developing the source maybe that would be the
configuration (indicating that you're developing a source and want these
more detailed metrics).

Maybe it would help to pick one or two IOs that you're looking at and talk
about proposed metrics? That might focus the discussion on what metrics
make sense to users and how expensive they might be to report?

On Tue, Feb 14, 2017 at 8:29 AM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi Aviem
>
> Agree with your comments, it's pretty close to my previous ones.
>
> Regards
> JB
>
> On Feb 14, 2017, 12:04, at 12:04, Aviem Zur <aviem...@gmail.com> wrote:
> >Hi Ismaël,
> >
> >You've raised some great points.
> >Please see my comments inline.
> >
> >On Tue, Feb 14, 2017 at 3:37 PM Ismaël Mejía <ieme...@gmail.com> wrote:
> >
> >> ​Hello,
> >>
> >> The new metrics API allows us to integrate some basic metrics into
> >the Beam
> >> IOs. I have been following some discussions about this on JIRAs/PRs,
> >and I
> >> think it is important to discuss the subject here so we can have more
> >> awareness and obtain ideas from the community.
> >>
> >> First I want to thank Ben for his work on the metrics API, and Aviem
> >for
> >> his ongoing work on metrics for IOs, e.g. KafkaIO) that made me aware
> >of
> >> this subject.
> >>
> >> There are some basic ideas to discuss e.g.
> >>
> >> - What are the responsibilities of Beam IOs in terms of Metrics
> >> (considering the fact that the actual IOs, server + client, usually
> >provide
> >> their own)?
> >>
> >
> >While it is true that many IOs provide their own metrics, I think that
> >Beam
> >should expose IO metrics because:
> >
> >1. Metrics which help understanding performance of a pipeline which
> >uses
> >   an IO may not be covered by the IO .
> >2. Users may not be able to setup integrations with the IO's metrics to
> >view them effectively (And correlate them to a specific Beam pipeline),
> >but
> >   still want to investigate their pipeline's performance.
> >
> >
> >> - What metrics are relevant to the pipeline (or some particular IOs)?
> >Kafka
> >> backlog for one could point that a pipeline is behind ingestion rate.
> >
> >
> >I think it depends on the IO, but there is probably overlap in some of
> >the
> >metrics so a guideline might be written for this.
> >I listed what I thought should be reported for KafkaIO in the following
> >JIRA: https://issues.apache.org/jira/browse/BEAM-1398
> >Feel free to add more metrics you think are important to report.
> >
> >
> >>
> >>
> >- Should metrics be calculated on IOs by default or no?
> >> - If metrics are defined by default does it make sense to allow users
> >to
> >> disable them?
> >>
> >
> >IIUC, your concern is that metrics will add overhead to the pipeline,
> >and
> >pipelines which are highly sensitive to this will be hampered?
> >In any case I think that yes, metrics calculation should be
> >configurable
> >(Enable/disable).
> >In Spark runner, for example the Metrics sink feature (not the metrics
> >calculation itself, but sinks to send them to) is configurable in the
> >pipeline options.
> >
> >
> >> Well these are just some questions around the subject so we can
> >create a
> >> common set of practices to include metrics in the IOs and eventually
> >> improve the transform guide with this. What do you think about this?
> >Do you
> >> have other questions/ideas?
> >>
> >> Thanks,
> >> Ismaël
> >>
>

Reply via email to