Re: Metrics for Beam IOs.

Amit Sela Sat, 18 Feb 2017 08:31:46 -0800

First issue with "push" metrics plugin - what if the runner's underlying
reporting mechanism is "pull" ? Codahale ScheduledReporter will sample the
values every X and send to ...
So any runner using a "pull-like" would use an adapter ?


On Sat, Feb 18, 2017 at 6:27 PM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi Ben,
>
> ok it's what I thought. Thanks for the clarification.
>
> +1 for the plugin-like "push" API (it's what I have in mind too ;)).
> I will start a PoC for discussion next week.
>
> Regards
> JB
>
> On 02/18/2017 05:17 PM, Ben Chambers wrote:
> > The runner can already report metrics during pipeline execution so it is
> > usable for monitoring.
> >
> > The pipeline result can be used to query metrics during pipeline
> execution,
> > so a first version of reporting to other systems is to periodically pulls
> > metrics from the runner with that API.
> >
> > We may eventually want to provide a plugin-like API to get the runner to
> > push metrics more directly to other metrics stores. This layer needs some
> > thought since it has to handle the complexity of attempted/committed
> > metrics to be consistent with the model.
> >
> >
> >
> > On Sat, Feb 18, 2017, 5:44 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
> >
> > Hi Amit,
> >
> > before Beam, I didn't mind about portability ;) So I used the Spark
> > approach.
> >
> > But, now, as a Beam user, I would expect a generic way to deal with
> > metric whatever the runner would be.
> >
> > Today, you are right: I'm using the solution provided by the execution
> > engine. That's the current approach and it works fine. And it's up to me
> > to leverage (for intance Accumulators) it with my own system.
> >
> > My thought is more to provide a generic way. It's only a discussion for
> > now ;)
> >
> > Regards
> > JB
> >
> > On 02/18/2017 02:38 PM, Amit Sela wrote:
> >> On Sat, Feb 18, 2017 at 10:16 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> >> wrote:
> >>
> >>> Hi Amit,
> >>>
> >>> my point is: how do we provide metric today to end user and how can
> they
> >>> use it to monitor a running pipeline ?
> >>>
> >>> Clearly the runner is involved, but, it should behave the same way for
> >>> all runners. Let me take an example.
> >>> On my ecosystem, I'm using both Flink and Spark with Beam, some
> >>> pipelines on each. I would like to get the metrics for all pipelines to
> >>> my monitoring backend. If I can "poll" from the execution engine metric
> >>> backend to my system that's acceptable, but it's an overhead of work.
> >>> Having a generic metric reporting layer would allow us to have a more
> >>> common way. If the user doesn't provide any reporting sink, then we use
> >>> the execution backend metric layer. If provided, we use the reporting
> > sink.
> >>>
> >> How did you do it before Beam ? I that for Spark you reported it's
> native
> >> metrics via Codahale Reporter and Accumulators were visible in the UI,
> and
> >> the Spark runner took it a step forward to make it all visible via
> >> Codahale. Assuming Flink does something similar, it all belongs to
> runner
> >> setup/configuration.
> >>
> >>>
> >>> About your question: you are right, it's possible to update a collector
> >>> or appender without impacting anything else.
> >>>
> >>> Regards
> >>> JB
> >>>
> >>> On 02/17/2017 10:38 PM, Amit Sela wrote:
> >>>> @JB I think what you're suggesting is that Beam should provide a
> > "Metrics
> >>>> Reporting" API as well, and I used to think like you, but the more I
> >>>> thought of that the more I tend to disagree now.
> >>>>
> >>>> The SDK is for users to author pipelines, so Metrics are for
> > user-defined
> >>>> metrics (in contrast to runner metrics).
> >>>>
> >>>> The Runner API is supposed to help different backends to integrate
> with
> >>>> Beam to allow users to execute those pipeline on their favourite
> >>> backend. I
> >>>> believe the Runner API has to provide restrictions/demands that are
> just
> >>>> enough so a runner could execute a Beam pipeline as best it can, and
> I'm
> >>>> afraid that this would demand runner authors to do work that is
> >>> unnecessary.
> >>>> This is also sort of "crossing the line" into the runner's domain and
> >>>> "telling it how to do" instead of what, and I don't think we want
> that.
> >>>>
> >>>> I do believe however that runner's should integrate the Metrics into
> >>> their
> >>>> own metrics reporting system - but that's for the runner author to
> >>> decide.
> >>>> Stas did this for the Spark runner because Spark doesn't report back
> >>>> user-defined Accumulators (Spark's Aggregators) to it's Metrics
> system.
> >>>>
> >>>> On a curious note though, did you use an OSGi service per event-type ?
> > so
> >>>> you can upgrade specific event-handlers without taking down the entire
> >>>> reporter ? but that's really unrelated to this thread :-) .
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Feb 17, 2017 at 8:36 PM Ben Chambers
> >>> <bchamb...@google.com.invalid>
> >>>> wrote:
> >>>>
> >>>>> It don't think it is possible for there to be a general mechanism for
> >>>>> pushing metrics out during the execution of a pipeline. The Metrics
> API
> >>>>> suggests that metrics should be reported as values across all
> attempts
> >>> and
> >>>>> values across only successful attempts. The latter requires runner
> >>>>> involvement to ensure that a given metric value is atomically
> >>> incremented
> >>>>> (or checkpointed) when the bundle it was reported in is committed.
> >>>>>
> >>>>> Aviem has already implemented Metrics support for the Spark runner. I
> > am
> >>>>> working on support for the Dataflow runner.
> >>>>>
> >>>>> On Fri, Feb 17, 2017 at 7:50 AM Jean-Baptiste Onofré <
> j...@nanthrax.net>
> >>>>> wrote:
> >>>>>
> >>>>> Hi guys,
> >>>>>
> >>>>> As I'm back from vacation, I'm back on this topic ;)
> >>>>>
> >>>>> It's a great discussion, and I think about the Metric IO coverage,
> it's
> >>>>> good.
> >>>>>
> >>>>> However, there's a point that we discussed very fast in the thread
> and
> > I
> >>>>> think it's an important one (maybe more important than the provided
> >>>>> metrics actually in term of roadmap ;)).
> >>>>>
> >>>>> Assuming we have pipelines, PTransforms, IOs, ... using the Metric
> API,
> >>>>> how do we expose the metrics for the end-users ?
> >>>>>
> >>>>> A first approach would be to bind a JMX MBean server by the pipeline
> > and
> >>>>> expose the metrics via MBeans. I don't think it's a good idea for the
> >>>>> following reasons:
> >>>>> 1. It's not easy to know where the pipeline is actually executed, and
> >>>>> so, not easy to find the MBean server URI.
> >>>>> 2. For the same reason, we can have port binding error.
> >>>>> 3. If it could work for unbounded/streaming pipelines (as they are
> >>>>> always "running"), it's not really applicable for bounded/batch
> >>>>> pipelines as their lifetime is "limited" ;)
> >>>>>
> >>>>> So, I think the "push" approach is better: during the execution, a
> >>>>> pipeline "internally" collects and pushes the metric to a backend.
> >>>>> The "push" could a kind of sink. For instance, the metric "records"
> can
> >>>>> be sent to a Kafka topic, or directly to Elasticsearch or whatever.
> >>>>> The metric backend can deal with alerting, reporting, etc.
> >>>>>
> >>>>> Basically, we have to define two things:
> >>>>> 1. The "appender" where the metrics have to be sent (and the
> >>>>> corresponding configuration to connect, like Kafka or Elasticsearch
> >>>>> location)
> >>>>> 2. The format of the metric data (for instance, json format).
> >>>>>
> >>>>> In Apache Karaf, I created something similar named Decanter:
> >>>>>
> >>>>>
> >>>>>
> >>>
> >
> http://blog.nanthrax.net/2015/07/monitoring-and-alerting-with-apache-karaf-decanter/
> >>>>>
> >>>>> http://karaf.apache.org/manual/decanter/latest-1/
> >>>>>
> >>>>> Decanter provides collectors that harvest the metrics (like JMX MBean
> >>>>> attributes, log messages, ...). Basically, for Beam, it would be
> >>>>> directly the Metric API used by pipeline parts.
> >>>>> Then, the metric record are send to a dispatcher which send the
> metric
> >>>>> records to an appender. The appenders store or send the metric
> records
> >>>>> to a backend (elasticsearc, cassandra, kafka, jms, reddis, ...).
> >>>>>
> >>>>> I think it would make sense to provide the configuration and Metric
> >>>>> "appender" via the pipeline options.
> >>>>> As it's not really runner specific, it could be part of the metric
> API
> >>>>> (or SPI in that case).
> >>>>>
> >>>>> WDYT ?
> >>>>>
> >>>>> Regards
> >>>>> JB
> >>>>>
> >>>>> On 02/15/2017 09:22 AM, Stas Levin wrote:
> >>>>>> +1 to making the IO metrics (e.g. producers, consumers) available as
> >>> part
> >>>>>> of the Beam pipeline metrics tree for debugging and visibility.
> >>>>>>
> >>>>>> As it has already been mentioned, many IO clients have a metrics
> >>>>> mechanism
> >>>>>> in place, so in these cases I think it could be beneficial to mirror
> >>>>> their
> >>>>>> metrics under the relevant subtree of the Beam metrics tree.
> >>>>>>
> >>>>>> On Wed, Feb 15, 2017 at 12:04 AM Amit Sela <amitsel...@gmail.com>
> >>> wrote:
> >>>>>>
> >>>>>>> I think this is a great discussion and I'd like to relate to some
> of
> >>> the
> >>>>>>> points raised here, and raise some of my own.
> >>>>>>>
> >>>>>>> First of all I think we should be careful here not to cross
> >>> boundaries.
> >>>>> IOs
> >>>>>>> naturally have many metrics, and Beam should avoid "taking over"
> >>> those.
> >>>>> IO
> >>>>>>> metrics should focus on what's relevant to the Pipeline:
> input/output
> >>>>> rate,
> >>>>>>> backlog (for UnboundedSources, which exists in bytes but for
> >>> monitoring
> >>>>>>> purposes we might want to consider #messages).
> >>>>>>>
> >>>>>>> I don't agree that we should not invest in doing this in
> > Sources/Sinks
> >>>>> and
> >>>>>>> going directly to SplittableDoFn because the IO API is familiar and
> >>>>> known,
> >>>>>>> and as long as we keep it should be treated as a first class
> citizen.
> >>>>>>>
> >>>>>>> As for enable/disable - if IOs consider focusing on
> pipeline-related
> >>>>>>> metrics I think we should be fine, though this could also change
> >>> between
> >>>>>>> runners as well.
> >>>>>>>
> >>>>>>> Finally, considering "split-metrics" is interesting because on one
> >>> hand
> >>>>> it
> >>>>>>> affects the pipeline directly (unbalanced partitions in Kafka that
> > may
> >>>>>>> cause backlog) but this is that fine-line of responsibilities
> (Kafka
> >>>>>>> monitoring would probably be able to tell you that partitions are
> not
> >>>>>>> balanced).
> >>>>>>>
> >>>>>>> My 2 cents, cheers!
> >>>>>>>
> >>>>>>> On Tue, Feb 14, 2017 at 8:46 PM Raghu Angadi
> >>> <rang...@google.com.invalid
> >>>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> On Tue, Feb 14, 2017 at 9:21 AM, Ben Chambers
> >>>>>>> <bchamb...@google.com.invalid
> >>>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> * I also think there are data source specific metrics that a
> given
> >>> IO
> >>>>>>>>> will
> >>>>>>>>>> want to expose (ie, things like kafka backlog for a topic.)
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> UnboundedSource has API for backlog. It is better for beam/runners
> > to
> >>>>>>>> handle backlog as well.
> >>>>>>>> Of course there will be some source specific metrics too (errors,
> > i/o
> >>>>> ops
> >>>>>>>> etc).
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> Jean-Baptiste Onofré
> >>>>> jbono...@apache.org
> >>>>> http://blog.nanthrax.net
> >>>>> Talend - http://www.talend.com
> >>>>>
> >>>>
> >>>
> >>> --
> >>> Jean-Baptiste Onofré
> >>> jbono...@apache.org
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> >>>
> >>
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Metrics for Beam IOs.

Reply via email to