On Sat, Feb 18, 2017 at 10:16 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
> Hi Amit, > > my point is: how do we provide metric today to end user and how can they > use it to monitor a running pipeline ? > > Clearly the runner is involved, but, it should behave the same way for > all runners. Let me take an example. > On my ecosystem, I'm using both Flink and Spark with Beam, some > pipelines on each. I would like to get the metrics for all pipelines to > my monitoring backend. If I can "poll" from the execution engine metric > backend to my system that's acceptable, but it's an overhead of work. > Having a generic metric reporting layer would allow us to have a more > common way. If the user doesn't provide any reporting sink, then we use > the execution backend metric layer. If provided, we use the reporting sink. > How did you do it before Beam ? I that for Spark you reported it's native metrics via Codahale Reporter and Accumulators were visible in the UI, and the Spark runner took it a step forward to make it all visible via Codahale. Assuming Flink does something similar, it all belongs to runner setup/configuration. > > About your question: you are right, it's possible to update a collector > or appender without impacting anything else. > > Regards > JB > > On 02/17/2017 10:38 PM, Amit Sela wrote: > > @JB I think what you're suggesting is that Beam should provide a "Metrics > > Reporting" API as well, and I used to think like you, but the more I > > thought of that the more I tend to disagree now. > > > > The SDK is for users to author pipelines, so Metrics are for user-defined > > metrics (in contrast to runner metrics). > > > > The Runner API is supposed to help different backends to integrate with > > Beam to allow users to execute those pipeline on their favourite > backend. I > > believe the Runner API has to provide restrictions/demands that are just > > enough so a runner could execute a Beam pipeline as best it can, and I'm > > afraid that this would demand runner authors to do work that is > unnecessary. > > This is also sort of "crossing the line" into the runner's domain and > > "telling it how to do" instead of what, and I don't think we want that. > > > > I do believe however that runner's should integrate the Metrics into > their > > own metrics reporting system - but that's for the runner author to > decide. > > Stas did this for the Spark runner because Spark doesn't report back > > user-defined Accumulators (Spark's Aggregators) to it's Metrics system. > > > > On a curious note though, did you use an OSGi service per event-type ? so > > you can upgrade specific event-handlers without taking down the entire > > reporter ? but that's really unrelated to this thread :-) . > > > > > > > > On Fri, Feb 17, 2017 at 8:36 PM Ben Chambers > <bchamb...@google.com.invalid> > > wrote: > > > >> It don't think it is possible for there to be a general mechanism for > >> pushing metrics out during the execution of a pipeline. The Metrics API > >> suggests that metrics should be reported as values across all attempts > and > >> values across only successful attempts. The latter requires runner > >> involvement to ensure that a given metric value is atomically > incremented > >> (or checkpointed) when the bundle it was reported in is committed. > >> > >> Aviem has already implemented Metrics support for the Spark runner. I am > >> working on support for the Dataflow runner. > >> > >> On Fri, Feb 17, 2017 at 7:50 AM Jean-Baptiste Onofré <j...@nanthrax.net> > >> wrote: > >> > >> Hi guys, > >> > >> As I'm back from vacation, I'm back on this topic ;) > >> > >> It's a great discussion, and I think about the Metric IO coverage, it's > >> good. > >> > >> However, there's a point that we discussed very fast in the thread and I > >> think it's an important one (maybe more important than the provided > >> metrics actually in term of roadmap ;)). > >> > >> Assuming we have pipelines, PTransforms, IOs, ... using the Metric API, > >> how do we expose the metrics for the end-users ? > >> > >> A first approach would be to bind a JMX MBean server by the pipeline and > >> expose the metrics via MBeans. I don't think it's a good idea for the > >> following reasons: > >> 1. It's not easy to know where the pipeline is actually executed, and > >> so, not easy to find the MBean server URI. > >> 2. For the same reason, we can have port binding error. > >> 3. If it could work for unbounded/streaming pipelines (as they are > >> always "running"), it's not really applicable for bounded/batch > >> pipelines as their lifetime is "limited" ;) > >> > >> So, I think the "push" approach is better: during the execution, a > >> pipeline "internally" collects and pushes the metric to a backend. > >> The "push" could a kind of sink. For instance, the metric "records" can > >> be sent to a Kafka topic, or directly to Elasticsearch or whatever. > >> The metric backend can deal with alerting, reporting, etc. > >> > >> Basically, we have to define two things: > >> 1. The "appender" where the metrics have to be sent (and the > >> corresponding configuration to connect, like Kafka or Elasticsearch > >> location) > >> 2. The format of the metric data (for instance, json format). > >> > >> In Apache Karaf, I created something similar named Decanter: > >> > >> > >> > http://blog.nanthrax.net/2015/07/monitoring-and-alerting-with-apache-karaf-decanter/ > >> > >> http://karaf.apache.org/manual/decanter/latest-1/ > >> > >> Decanter provides collectors that harvest the metrics (like JMX MBean > >> attributes, log messages, ...). Basically, for Beam, it would be > >> directly the Metric API used by pipeline parts. > >> Then, the metric record are send to a dispatcher which send the metric > >> records to an appender. The appenders store or send the metric records > >> to a backend (elasticsearc, cassandra, kafka, jms, reddis, ...). > >> > >> I think it would make sense to provide the configuration and Metric > >> "appender" via the pipeline options. > >> As it's not really runner specific, it could be part of the metric API > >> (or SPI in that case). > >> > >> WDYT ? > >> > >> Regards > >> JB > >> > >> On 02/15/2017 09:22 AM, Stas Levin wrote: > >>> +1 to making the IO metrics (e.g. producers, consumers) available as > part > >>> of the Beam pipeline metrics tree for debugging and visibility. > >>> > >>> As it has already been mentioned, many IO clients have a metrics > >> mechanism > >>> in place, so in these cases I think it could be beneficial to mirror > >> their > >>> metrics under the relevant subtree of the Beam metrics tree. > >>> > >>> On Wed, Feb 15, 2017 at 12:04 AM Amit Sela <amitsel...@gmail.com> > wrote: > >>> > >>>> I think this is a great discussion and I'd like to relate to some of > the > >>>> points raised here, and raise some of my own. > >>>> > >>>> First of all I think we should be careful here not to cross > boundaries. > >> IOs > >>>> naturally have many metrics, and Beam should avoid "taking over" > those. > >> IO > >>>> metrics should focus on what's relevant to the Pipeline: input/output > >> rate, > >>>> backlog (for UnboundedSources, which exists in bytes but for > monitoring > >>>> purposes we might want to consider #messages). > >>>> > >>>> I don't agree that we should not invest in doing this in Sources/Sinks > >> and > >>>> going directly to SplittableDoFn because the IO API is familiar and > >> known, > >>>> and as long as we keep it should be treated as a first class citizen. > >>>> > >>>> As for enable/disable - if IOs consider focusing on pipeline-related > >>>> metrics I think we should be fine, though this could also change > between > >>>> runners as well. > >>>> > >>>> Finally, considering "split-metrics" is interesting because on one > hand > >> it > >>>> affects the pipeline directly (unbalanced partitions in Kafka that may > >>>> cause backlog) but this is that fine-line of responsibilities (Kafka > >>>> monitoring would probably be able to tell you that partitions are not > >>>> balanced). > >>>> > >>>> My 2 cents, cheers! > >>>> > >>>> On Tue, Feb 14, 2017 at 8:46 PM Raghu Angadi > <rang...@google.com.invalid > >>> > >>>> wrote: > >>>> > >>>>> On Tue, Feb 14, 2017 at 9:21 AM, Ben Chambers > >>>> <bchamb...@google.com.invalid > >>>>>> > >>>>> wrote: > >>>>> > >>>>>> > >>>>>>> * I also think there are data source specific metrics that a given > IO > >>>>>> will > >>>>>>> want to expose (ie, things like kafka backlog for a topic.) > >>>>> > >>>>> > >>>>> UnboundedSource has API for backlog. It is better for beam/runners to > >>>>> handle backlog as well. > >>>>> Of course there will be some source specific metrics too (errors, i/o > >> ops > >>>>> etc). > >>>>> > >>>> > >>> > >> > >> -- > >> Jean-Baptiste Onofré > >> jbono...@apache.org > >> http://blog.nanthrax.net > >> Talend - http://www.talend.com > >> > > > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >