Re: Metrics for Beam IOs.

Stas Levin Sat, 18 Feb 2017 03:19:51 -0800

JB, I think you raise a good point. As a user, if I'm sold on pipeline
portability, I would expect the deal to include metrics as well as.
Otherwise, being able to port a pipeline to a different execution engine at
the cost of "flying blind" might not be as tempting.


If we wish to give users a consistent set of metrics (which is totally +1
IMHO), even before we decide on what such a metrics set should include,
there seem to be at least 2 ways to achieve this:

   1. Hard spec, via API: have the SDK define metric specs for runners/IOs
   to implement
   2. Soft spec, via documentation: specify the expected metrics in docs,
   some of the metrics could be "must have" while others could be "nice to
   have", and let runners implement (interpret?) what they can, and how they
   can.

Naturally, each of these has its pros and cons.

Hard spec is great for consistency, but may make things harder on the
implementors due to the efforts required to align the internals of each
execution engine to the SDK (e.g., attempted vs. committed metrics, as
discussed in a separate thread here on the dev list)

Soft spec is enforced by doc rather than by cod and leaves consistency up
to disincline (meh...), but may make it easier on implementors to get
things done as they have more freedom.

-Stas

On Sat, Feb 18, 2017 at 10:16 AM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi Amit,
>
> my point is: how do we provide metric today to end user and how can they
> use it to monitor a running pipeline ?
>
> Clearly the runner is involved, but, it should behave the same way for
> all runners. Let me take an example.
> On my ecosystem, I'm using both Flink and Spark with Beam, some
> pipelines on each. I would like to get the metrics for all pipelines to
> my monitoring backend. If I can "poll" from the execution engine metric
> backend to my system that's acceptable, but it's an overhead of work.
> Having a generic metric reporting layer would allow us to have a more
> common way. If the user doesn't provide any reporting sink, then we use
> the execution backend metric layer. If provided, we use the reporting sink.
>
> About your question: you are right, it's possible to update a collector
> or appender without impacting anything else.
>
> Regards
> JB
>
> On 02/17/2017 10:38 PM, Amit Sela wrote:
> > @JB I think what you're suggesting is that Beam should provide a "Metrics
> > Reporting" API as well, and I used to think like you, but the more I
> > thought of that the more I tend to disagree now.
> >
> > The SDK is for users to author pipelines, so Metrics are for user-defined
> > metrics (in contrast to runner metrics).
> >
> > The Runner API is supposed to help different backends to integrate with
> > Beam to allow users to execute those pipeline on their favourite
> backend. I
> > believe the Runner API has to provide restrictions/demands that are just
> > enough so a runner could execute a Beam pipeline as best it can, and I'm
> > afraid that this would demand runner authors to do work that is
> unnecessary.
> > This is also sort of "crossing the line" into the runner's domain and
> > "telling it how to do" instead of what, and I don't think we want that.
> >
> > I do believe however that runner's should integrate the Metrics into
> their
> > own metrics reporting system - but that's for the runner author to
> decide.
> > Stas did this for the Spark runner because Spark doesn't report back
> > user-defined Accumulators (Spark's Aggregators) to it's Metrics system.
> >
> > On a curious note though, did you use an OSGi service per event-type ? so
> > you can upgrade specific event-handlers without taking down the entire
> > reporter ? but that's really unrelated to this thread :-) .
> >
> >
> >
> > On Fri, Feb 17, 2017 at 8:36 PM Ben Chambers
> <bchamb...@google.com.invalid>
> > wrote:
> >
> >> It don't think it is possible for there to be a general mechanism for
> >> pushing metrics out during the execution of a pipeline. The Metrics API
> >> suggests that metrics should be reported as values across all attempts
> and
> >> values across only successful attempts. The latter requires runner
> >> involvement to ensure that a given metric value is atomically
> incremented
> >> (or checkpointed) when the bundle it was reported in is committed.
> >>
> >> Aviem has already implemented Metrics support for the Spark runner. I am
> >> working on support for the Dataflow runner.
> >>
> >> On Fri, Feb 17, 2017 at 7:50 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> >> wrote:
> >>
> >> Hi guys,
> >>
> >> As I'm back from vacation, I'm back on this topic ;)
> >>
> >> It's a great discussion, and I think about the Metric IO coverage, it's
> >> good.
> >>
> >> However, there's a point that we discussed very fast in the thread and I
> >> think it's an important one (maybe more important than the provided
> >> metrics actually in term of roadmap ;)).
> >>
> >> Assuming we have pipelines, PTransforms, IOs, ... using the Metric API,
> >> how do we expose the metrics for the end-users ?
> >>
> >> A first approach would be to bind a JMX MBean server by the pipeline and
> >> expose the metrics via MBeans. I don't think it's a good idea for the
> >> following reasons:
> >> 1. It's not easy to know where the pipeline is actually executed, and
> >> so, not easy to find the MBean server URI.
> >> 2. For the same reason, we can have port binding error.
> >> 3. If it could work for unbounded/streaming pipelines (as they are
> >> always "running"), it's not really applicable for bounded/batch
> >> pipelines as their lifetime is "limited" ;)
> >>
> >> So, I think the "push" approach is better: during the execution, a
> >> pipeline "internally" collects and pushes the metric to a backend.
> >> The "push" could a kind of sink. For instance, the metric "records" can
> >> be sent to a Kafka topic, or directly to Elasticsearch or whatever.
> >> The metric backend can deal with alerting, reporting, etc.
> >>
> >> Basically, we have to define two things:
> >> 1. The "appender" where the metrics have to be sent (and the
> >> corresponding configuration to connect, like Kafka or Elasticsearch
> >> location)
> >> 2. The format of the metric data (for instance, json format).
> >>
> >> In Apache Karaf, I created something similar named Decanter:
> >>
> >>
> >>
> http://blog.nanthrax.net/2015/07/monitoring-and-alerting-with-apache-karaf-decanter/
> >>
> >> http://karaf.apache.org/manual/decanter/latest-1/
> >>
> >> Decanter provides collectors that harvest the metrics (like JMX MBean
> >> attributes, log messages, ...). Basically, for Beam, it would be
> >> directly the Metric API used by pipeline parts.
> >> Then, the metric record are send to a dispatcher which send the metric
> >> records to an appender. The appenders store or send the metric records
> >> to a backend (elasticsearc, cassandra, kafka, jms, reddis, ...).
> >>
> >> I think it would make sense to provide the configuration and Metric
> >> "appender" via the pipeline options.
> >> As it's not really runner specific, it could be part of the metric API
> >> (or SPI in that case).
> >>
> >> WDYT ?
> >>
> >> Regards
> >> JB
> >>
> >> On 02/15/2017 09:22 AM, Stas Levin wrote:
> >>> +1 to making the IO metrics (e.g. producers, consumers) available as
> part
> >>> of the Beam pipeline metrics tree for debugging and visibility.
> >>>
> >>> As it has already been mentioned, many IO clients have a metrics
> >> mechanism
> >>> in place, so in these cases I think it could be beneficial to mirror
> >> their
> >>> metrics under the relevant subtree of the Beam metrics tree.
> >>>
> >>> On Wed, Feb 15, 2017 at 12:04 AM Amit Sela <amitsel...@gmail.com>
> wrote:
> >>>
> >>>> I think this is a great discussion and I'd like to relate to some of
> the
> >>>> points raised here, and raise some of my own.
> >>>>
> >>>> First of all I think we should be careful here not to cross
> boundaries.
> >> IOs
> >>>> naturally have many metrics, and Beam should avoid "taking over"
> those.
> >> IO
> >>>> metrics should focus on what's relevant to the Pipeline: input/output
> >> rate,
> >>>> backlog (for UnboundedSources, which exists in bytes but for
> monitoring
> >>>> purposes we might want to consider #messages).
> >>>>
> >>>> I don't agree that we should not invest in doing this in Sources/Sinks
> >> and
> >>>> going directly to SplittableDoFn because the IO API is familiar and
> >> known,
> >>>> and as long as we keep it should be treated as a first class citizen.
> >>>>
> >>>> As for enable/disable - if IOs consider focusing on pipeline-related
> >>>> metrics I think we should be fine, though this could also change
> between
> >>>> runners as well.
> >>>>
> >>>> Finally, considering "split-metrics" is interesting because on one
> hand
> >> it
> >>>> affects the pipeline directly (unbalanced partitions in Kafka that may
> >>>> cause backlog) but this is that fine-line of responsibilities (Kafka
> >>>> monitoring would probably be able to tell you that partitions are not
> >>>> balanced).
> >>>>
> >>>> My 2 cents, cheers!
> >>>>
> >>>> On Tue, Feb 14, 2017 at 8:46 PM Raghu Angadi
> <rang...@google.com.invalid
> >>>
> >>>> wrote:
> >>>>
> >>>>> On Tue, Feb 14, 2017 at 9:21 AM, Ben Chambers
> >>>> <bchamb...@google.com.invalid
> >>>>>>
> >>>>> wrote:
> >>>>>
> >>>>>>
> >>>>>>> * I also think there are data source specific metrics that a given
> IO
> >>>>>> will
> >>>>>>> want to expose (ie, things like kafka backlog for a topic.)
> >>>>>
> >>>>>
> >>>>> UnboundedSource has API for backlog. It is better for beam/runners to
> >>>>> handle backlog as well.
> >>>>> Of course there will be some source specific metrics too (errors, i/o
> >> ops
> >>>>> etc).
> >>>>>
> >>>>
> >>>
> >>
> >> --
> >> Jean-Baptiste Onofré
> >> jbono...@apache.org
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Metrics for Beam IOs.

Reply via email to