On Sat, Feb 18, 2017 at 1:19 PM Stas Levin <stasle...@apache.org> wrote:
> JB, I think you raise a good point. As a user, if I'm sold on pipeline > portability, I would expect the deal to include metrics as well as. > Otherwise, being able to port a pipeline to a different execution engine at > the cost of "flying blind" might not be as tempting. > This is a good example of where to draw the line of Beam's responsibilities - we can take further and say we also orchestrate clusters, installations etc. - I (and that might just be me) believe the line should be drawn at the Metrics API. > > If we wish to give users a consistent set of metrics (which is totally +1 > IMHO), even before we decide on what such a metrics set should include, > there seem to be at least 2 ways to achieve this: > > 1. Hard spec, via API: have the SDK define metric specs for runners/IOs > to implement > 2. Soft spec, via documentation: specify the expected metrics in docs, > some of the metrics could be "must have" while others could be "nice to > have", and let runners implement (interpret?) what they can, and how > they > can. > > Naturally, each of these has its pros and cons. > > Hard spec is great for consistency, but may make things harder on the > implementors due to the efforts required to align the internals of each > execution engine to the SDK (e.g., attempted vs. committed metrics, as > discussed in a separate thread here on the dev list) > > Soft spec is enforced by doc rather than by cod and leaves consistency up > to disincline (meh...), but may make it easier on implementors to get > things done as they have more freedom. > > -Stas > > On Sat, Feb 18, 2017 at 10:16 AM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > > > Hi Amit, > > > > my point is: how do we provide metric today to end user and how can they > > use it to monitor a running pipeline ? > > > > Clearly the runner is involved, but, it should behave the same way for > > all runners. Let me take an example. > > On my ecosystem, I'm using both Flink and Spark with Beam, some > > pipelines on each. I would like to get the metrics for all pipelines to > > my monitoring backend. If I can "poll" from the execution engine metric > > backend to my system that's acceptable, but it's an overhead of work. > > Having a generic metric reporting layer would allow us to have a more > > common way. If the user doesn't provide any reporting sink, then we use > > the execution backend metric layer. If provided, we use the reporting > sink. > > > > About your question: you are right, it's possible to update a collector > > or appender without impacting anything else. > > > > Regards > > JB > > > > On 02/17/2017 10:38 PM, Amit Sela wrote: > > > @JB I think what you're suggesting is that Beam should provide a > "Metrics > > > Reporting" API as well, and I used to think like you, but the more I > > > thought of that the more I tend to disagree now. > > > > > > The SDK is for users to author pipelines, so Metrics are for > user-defined > > > metrics (in contrast to runner metrics). > > > > > > The Runner API is supposed to help different backends to integrate with > > > Beam to allow users to execute those pipeline on their favourite > > backend. I > > > believe the Runner API has to provide restrictions/demands that are > just > > > enough so a runner could execute a Beam pipeline as best it can, and > I'm > > > afraid that this would demand runner authors to do work that is > > unnecessary. > > > This is also sort of "crossing the line" into the runner's domain and > > > "telling it how to do" instead of what, and I don't think we want that. > > > > > > I do believe however that runner's should integrate the Metrics into > > their > > > own metrics reporting system - but that's for the runner author to > > decide. > > > Stas did this for the Spark runner because Spark doesn't report back > > > user-defined Accumulators (Spark's Aggregators) to it's Metrics system. > > > > > > On a curious note though, did you use an OSGi service per event-type ? > so > > > you can upgrade specific event-handlers without taking down the entire > > > reporter ? but that's really unrelated to this thread :-) . > > > > > > > > > > > > On Fri, Feb 17, 2017 at 8:36 PM Ben Chambers > > <bchamb...@google.com.invalid> > > > wrote: > > > > > >> It don't think it is possible for there to be a general mechanism for > > >> pushing metrics out during the execution of a pipeline. The Metrics > API > > >> suggests that metrics should be reported as values across all attempts > > and > > >> values across only successful attempts. The latter requires runner > > >> involvement to ensure that a given metric value is atomically > > incremented > > >> (or checkpointed) when the bundle it was reported in is committed. > > >> > > >> Aviem has already implemented Metrics support for the Spark runner. I > am > > >> working on support for the Dataflow runner. > > >> > > >> On Fri, Feb 17, 2017 at 7:50 AM Jean-Baptiste Onofré <j...@nanthrax.net > > > > >> wrote: > > >> > > >> Hi guys, > > >> > > >> As I'm back from vacation, I'm back on this topic ;) > > >> > > >> It's a great discussion, and I think about the Metric IO coverage, > it's > > >> good. > > >> > > >> However, there's a point that we discussed very fast in the thread > and I > > >> think it's an important one (maybe more important than the provided > > >> metrics actually in term of roadmap ;)). > > >> > > >> Assuming we have pipelines, PTransforms, IOs, ... using the Metric > API, > > >> how do we expose the metrics for the end-users ? > > >> > > >> A first approach would be to bind a JMX MBean server by the pipeline > and > > >> expose the metrics via MBeans. I don't think it's a good idea for the > > >> following reasons: > > >> 1. It's not easy to know where the pipeline is actually executed, and > > >> so, not easy to find the MBean server URI. > > >> 2. For the same reason, we can have port binding error. > > >> 3. If it could work for unbounded/streaming pipelines (as they are > > >> always "running"), it's not really applicable for bounded/batch > > >> pipelines as their lifetime is "limited" ;) > > >> > > >> So, I think the "push" approach is better: during the execution, a > > >> pipeline "internally" collects and pushes the metric to a backend. > > >> The "push" could a kind of sink. For instance, the metric "records" > can > > >> be sent to a Kafka topic, or directly to Elasticsearch or whatever. > > >> The metric backend can deal with alerting, reporting, etc. > > >> > > >> Basically, we have to define two things: > > >> 1. The "appender" where the metrics have to be sent (and the > > >> corresponding configuration to connect, like Kafka or Elasticsearch > > >> location) > > >> 2. The format of the metric data (for instance, json format). > > >> > > >> In Apache Karaf, I created something similar named Decanter: > > >> > > >> > > >> > > > http://blog.nanthrax.net/2015/07/monitoring-and-alerting-with-apache-karaf-decanter/ > > >> > > >> http://karaf.apache.org/manual/decanter/latest-1/ > > >> > > >> Decanter provides collectors that harvest the metrics (like JMX MBean > > >> attributes, log messages, ...). Basically, for Beam, it would be > > >> directly the Metric API used by pipeline parts. > > >> Then, the metric record are send to a dispatcher which send the metric > > >> records to an appender. The appenders store or send the metric records > > >> to a backend (elasticsearc, cassandra, kafka, jms, reddis, ...). > > >> > > >> I think it would make sense to provide the configuration and Metric > > >> "appender" via the pipeline options. > > >> As it's not really runner specific, it could be part of the metric API > > >> (or SPI in that case). > > >> > > >> WDYT ? > > >> > > >> Regards > > >> JB > > >> > > >> On 02/15/2017 09:22 AM, Stas Levin wrote: > > >>> +1 to making the IO metrics (e.g. producers, consumers) available as > > part > > >>> of the Beam pipeline metrics tree for debugging and visibility. > > >>> > > >>> As it has already been mentioned, many IO clients have a metrics > > >> mechanism > > >>> in place, so in these cases I think it could be beneficial to mirror > > >> their > > >>> metrics under the relevant subtree of the Beam metrics tree. > > >>> > > >>> On Wed, Feb 15, 2017 at 12:04 AM Amit Sela <amitsel...@gmail.com> > > wrote: > > >>> > > >>>> I think this is a great discussion and I'd like to relate to some of > > the > > >>>> points raised here, and raise some of my own. > > >>>> > > >>>> First of all I think we should be careful here not to cross > > boundaries. > > >> IOs > > >>>> naturally have many metrics, and Beam should avoid "taking over" > > those. > > >> IO > > >>>> metrics should focus on what's relevant to the Pipeline: > input/output > > >> rate, > > >>>> backlog (for UnboundedSources, which exists in bytes but for > > monitoring > > >>>> purposes we might want to consider #messages). > > >>>> > > >>>> I don't agree that we should not invest in doing this in > Sources/Sinks > > >> and > > >>>> going directly to SplittableDoFn because the IO API is familiar and > > >> known, > > >>>> and as long as we keep it should be treated as a first class > citizen. > > >>>> > > >>>> As for enable/disable - if IOs consider focusing on pipeline-related > > >>>> metrics I think we should be fine, though this could also change > > between > > >>>> runners as well. > > >>>> > > >>>> Finally, considering "split-metrics" is interesting because on one > > hand > > >> it > > >>>> affects the pipeline directly (unbalanced partitions in Kafka that > may > > >>>> cause backlog) but this is that fine-line of responsibilities (Kafka > > >>>> monitoring would probably be able to tell you that partitions are > not > > >>>> balanced). > > >>>> > > >>>> My 2 cents, cheers! > > >>>> > > >>>> On Tue, Feb 14, 2017 at 8:46 PM Raghu Angadi > > <rang...@google.com.invalid > > >>> > > >>>> wrote: > > >>>> > > >>>>> On Tue, Feb 14, 2017 at 9:21 AM, Ben Chambers > > >>>> <bchamb...@google.com.invalid > > >>>>>> > > >>>>> wrote: > > >>>>> > > >>>>>> > > >>>>>>> * I also think there are data source specific metrics that a > given > > IO > > >>>>>> will > > >>>>>>> want to expose (ie, things like kafka backlog for a topic.) > > >>>>> > > >>>>> > > >>>>> UnboundedSource has API for backlog. It is better for beam/runners > to > > >>>>> handle backlog as well. > > >>>>> Of course there will be some source specific metrics too (errors, > i/o > > >> ops > > >>>>> etc). > > >>>>> > > >>>> > > >>> > > >> > > >> -- > > >> Jean-Baptiste Onofré > > >> jbono...@apache.org > > >> http://blog.nanthrax.net > > >> Talend - http://www.talend.com > > >> > > > > > > > -- > > Jean-Baptiste Onofré > > jbono...@apache.org > > http://blog.nanthrax.net > > Talend - http://www.talend.com > > >