Re: Metrics for Beam IOs.
Hi Ismaël, You did a great summary and you put the emphasis on the valid point. I full agree with you about the fact we should provide something agnostic for end-users to get the metrics. Let's see what the others are thinking. I'm ready to start a PoC to bring part of Decanter in Beam ;) Thanks again Regards JB On 02/22/2017 07:55 PM, Ismaël Mejía wrote: Hello, Thanks everyone for giving your points of view. I was waiting to see how the conversation evolved to summarize it and continue on the open points. Points where mostly everybody agrees (please correct me if somebody still disagrees): - Default metrics should not affect performance, for that reason they should be calculated by default. (in this case disabling them should matter less and probably we can evaluate if we make this configurable per IO in the future). - There is a clear distinction between metrics that are useful for the user and metrics that are useful for the runner. The user metrics are the most important for user experience. (we will focus on these). - We should support metrics for IOs in both APIs: Source/Sink based IOs and SDF. - IO metrics should focus on what's relevant to the Pipeline. Relevant metrics should be discussed in a per IO basis. (It is hard to generalize, but probably we will make progress faster just creating metrics for each one and then consolidating the common ones). Points where consensus is not yet achieved - Should IOs expose metrics that are useful for the runners? And if so How? I think this is important but not relevant for the current discussion so we should probably open a different conversation for this. My only comment around this is that we must prevent the interference of mixing runner and user oriented metrics (probably with a namespace). - Where is the frontier of the responsibilities of Beam for metrics? Should we have a runner-agnostic way to recollect metrics (different from result polling)? We can offer a plugin-like system to push metrics into given sinks, JB proposed an approach similar to Karaf’s Decanter. There is also the issue of pull-based metrics like those of Codehale. As a user I think having something like what JB proposed is nice, even a REST service to query stuff about pipelines in a runner-agnostic way would make me happy too, but again it is up to the community to decide how can we we implement this and if this should be part of Beam. What do you guys think about the pending issues? Did I miss something else ? Ismaël On Sat, Feb 18, 2017 at 9:02 PM, Jean-Baptiste Onofréwrote: Yes, agree Ben. More than the collected metrics, my question is more how to "expose"/"push" those metrics. Imagine, I have a pipeline executed using the Spark runner on a Spark cluster. Now, I change this pipeline to use the dataflow runner on Google Cloud Dataflow service. I have to completely change the way of getting metrics. Optionally, if I'm able to define some like --metric-appender=foo.cfg containing additionally to the execution engine specific layer, a target where I can push the metric (like elasticsearch or kafka), I can implement a simple and generic way of harvesting metrics. It's totally fine to have some metric specific to Spark when using the Spark runner and others specific to Dataflow when using the dataflow runner, my point is more: how I can send the metrics to my global monitoring/reporting layer. Somehow, we can see the metrics like meta side output of the pipeline, send to a target sink. I don't want to change or bypass the execution engine specific, I mean provide a way for the user to target his system (optional). Regards JB On 02/18/2017 06:15 PM, Ben Chambers wrote: The question is how much of metrics has to be in the runner and how much can be shared. So far we share the API the user uses to report metrics - this is the most important part since it is required for pipelines to be portable. The next piece that could be shared is something related to reporting. But, implementing logically correct metrics requires the execution engine to be involved, since it depends on how and which bundles are retried. What I'm not sure about is how much can be shared and/or made available for these runners vs. how much is tied to the execution engine. On Sat, Feb 18, 2017, 8:52 AM Jean-Baptiste Onofré wrote: For Spark, I fully agree. My point is more when the execution engine or runner doesn't provide anything or we have to provide a generic way of harvesting/pushing metrics. It could at least be a documentation point. Actually, I'm evaluation the monitoring capabilities of the different runners. Regards JB On 02/18/2017 05:47 PM, Amit Sela wrote: That's what I don't understand - why would we want that ? Taking on responsibilities in the "stack" should have a good reason. Someone choosing to run Beam on Spark/Flink/Apex would have to take care of installing those clusters, right ? perhaps providing them a
Re: Metrics for Beam IOs.
Hello, Thanks everyone for giving your points of view. I was waiting to see how the conversation evolved to summarize it and continue on the open points. Points where mostly everybody agrees (please correct me if somebody still disagrees): - Default metrics should not affect performance, for that reason they should be calculated by default. (in this case disabling them should matter less and probably we can evaluate if we make this configurable per IO in the future). - There is a clear distinction between metrics that are useful for the user and metrics that are useful for the runner. The user metrics are the most important for user experience. (we will focus on these). - We should support metrics for IOs in both APIs: Source/Sink based IOs and SDF. - IO metrics should focus on what's relevant to the Pipeline. Relevant metrics should be discussed in a per IO basis. (It is hard to generalize, but probably we will make progress faster just creating metrics for each one and then consolidating the common ones). Points where consensus is not yet achieved - Should IOs expose metrics that are useful for the runners? And if so How? I think this is important but not relevant for the current discussion so we should probably open a different conversation for this. My only comment around this is that we must prevent the interference of mixing runner and user oriented metrics (probably with a namespace). - Where is the frontier of the responsibilities of Beam for metrics? Should we have a runner-agnostic way to recollect metrics (different from result polling)? We can offer a plugin-like system to push metrics into given sinks, JB proposed an approach similar to Karaf’s Decanter. There is also the issue of pull-based metrics like those of Codehale. As a user I think having something like what JB proposed is nice, even a REST service to query stuff about pipelines in a runner-agnostic way would make me happy too, but again it is up to the community to decide how can we we implement this and if this should be part of Beam. What do you guys think about the pending issues? Did I miss something else ? Ismaël On Sat, Feb 18, 2017 at 9:02 PM, Jean-Baptiste Onofréwrote: > Yes, agree Ben. > > More than the collected metrics, my question is more how to > "expose"/"push" those metrics. > > Imagine, I have a pipeline executed using the Spark runner on a Spark > cluster. Now, I change this pipeline to use the dataflow runner on Google > Cloud Dataflow service. I have to completely change the way of getting > metrics. > Optionally, if I'm able to define some like --metric-appender=foo.cfg > containing additionally to the execution engine specific layer, a target > where I can push the metric (like elasticsearch or kafka), I can implement > a simple and generic way of harvesting metrics. > It's totally fine to have some metric specific to Spark when using the > Spark runner and others specific to Dataflow when using the dataflow > runner, my point is more: how I can send the metrics to my global > monitoring/reporting layer. > Somehow, we can see the metrics like meta side output of the pipeline, > send to a target sink. > > I don't want to change or bypass the execution engine specific, I mean > provide a way for the user to target his system (optional). > > Regards > JB > > > On 02/18/2017 06:15 PM, Ben Chambers wrote: > >> The question is how much of metrics has to be in the runner and how much >> can be shared. So far we share the API the user uses to report metrics - >> this is the most important part since it is required for pipelines to be >> portable. >> >> The next piece that could be shared is something related to reporting. >> But, >> implementing logically correct metrics requires the execution engine to be >> involved, since it depends on how and which bundles are retried. What I'm >> not sure about is how much can be shared and/or made available for these >> runners vs. how much is tied to the execution engine. >> >> On Sat, Feb 18, 2017, 8:52 AM Jean-Baptiste Onofré >> wrote: >> >> For Spark, I fully agree. My point is more when the execution engine or >>> runner doesn't provide anything or we have to provide a generic way of >>> harvesting/pushing metrics. >>> >>> It could at least be a documentation point. Actually, I'm evaluation the >>> monitoring capabilities of the different runners. >>> >>> Regards >>> JB >>> >>> On 02/18/2017 05:47 PM, Amit Sela wrote: >>> That's what I don't understand - why would we want that ? Taking on responsibilities in the "stack" should have a good reason. Someone choosing to run Beam on Spark/Flink/Apex would have to take care >>> of >>> installing those clusters, right ? perhaps providing them a resilient underlying FS ? and if he wants, setup monitoring (which even with the >>> API >>> proposed he'd have to do). I just don't see why it should be a part of the runner
Re: Metrics for Beam IOs.
For Spark, I fully agree. My point is more when the execution engine or runner doesn't provide anything or we have to provide a generic way of harvesting/pushing metrics. It could at least be a documentation point. Actually, I'm evaluation the monitoring capabilities of the different runners. Regards JB On 02/18/2017 05:47 PM, Amit Sela wrote: That's what I don't understand - why would we want that ? Taking on responsibilities in the "stack" should have a good reason. Someone choosing to run Beam on Spark/Flink/Apex would have to take care of installing those clusters, right ? perhaps providing them a resilient underlying FS ? and if he wants, setup monitoring (which even with the API proposed he'd have to do). I just don't see why it should be a part of the runner and/or Metrics API. On Sat, Feb 18, 2017 at 6:35 PM Jean-Baptiste Onofréwrote: Good point. In Decanter, it's what I named a "scheduled collector". So, yes, the adapter will periodically harvest metric to push. Regards JB On 02/18/2017 05:30 PM, Amit Sela wrote: First issue with "push" metrics plugin - what if the runner's underlying reporting mechanism is "pull" ? Codahale ScheduledReporter will sample the values every X and send to ... So any runner using a "pull-like" would use an adapter ? On Sat, Feb 18, 2017 at 6:27 PM Jean-Baptiste Onofré wrote: Hi Ben, ok it's what I thought. Thanks for the clarification. +1 for the plugin-like "push" API (it's what I have in mind too ;)). I will start a PoC for discussion next week. Regards JB On 02/18/2017 05:17 PM, Ben Chambers wrote: The runner can already report metrics during pipeline execution so it is usable for monitoring. The pipeline result can be used to query metrics during pipeline execution, so a first version of reporting to other systems is to periodically pulls metrics from the runner with that API. We may eventually want to provide a plugin-like API to get the runner to push metrics more directly to other metrics stores. This layer needs some thought since it has to handle the complexity of attempted/committed metrics to be consistent with the model. On Sat, Feb 18, 2017, 5:44 AM Jean-Baptiste Onofré wrote: Hi Amit, before Beam, I didn't mind about portability ;) So I used the Spark approach. But, now, as a Beam user, I would expect a generic way to deal with metric whatever the runner would be. Today, you are right: I'm using the solution provided by the execution engine. That's the current approach and it works fine. And it's up to me to leverage (for intance Accumulators) it with my own system. My thought is more to provide a generic way. It's only a discussion for now ;) Regards JB On 02/18/2017 02:38 PM, Amit Sela wrote: On Sat, Feb 18, 2017 at 10:16 AM Jean-Baptiste Onofré < j...@nanthrax.net> wrote: Hi Amit, my point is: how do we provide metric today to end user and how can they use it to monitor a running pipeline ? Clearly the runner is involved, but, it should behave the same way for all runners. Let me take an example. On my ecosystem, I'm using both Flink and Spark with Beam, some pipelines on each. I would like to get the metrics for all pipelines to my monitoring backend. If I can "poll" from the execution engine metric backend to my system that's acceptable, but it's an overhead of work. Having a generic metric reporting layer would allow us to have a more common way. If the user doesn't provide any reporting sink, then we use the execution backend metric layer. If provided, we use the reporting sink. How did you do it before Beam ? I that for Spark you reported it's native metrics via Codahale Reporter and Accumulators were visible in the UI, and the Spark runner took it a step forward to make it all visible via Codahale. Assuming Flink does something similar, it all belongs to runner setup/configuration. About your question: you are right, it's possible to update a collector or appender without impacting anything else. Regards JB On 02/17/2017 10:38 PM, Amit Sela wrote: @JB I think what you're suggesting is that Beam should provide a "Metrics Reporting" API as well, and I used to think like you, but the more I thought of that the more I tend to disagree now. The SDK is for users to author pipelines, so Metrics are for user-defined metrics (in contrast to runner metrics). The Runner API is supposed to help different backends to integrate with Beam to allow users to execute those pipeline on their favourite backend. I believe the Runner API has to provide restrictions/demands that are just enough so a runner could execute a Beam pipeline as best it can, and I'm afraid that this would demand runner authors to do work that is unnecessary. This is also sort of "crossing the line" into the runner's domain and "telling it how to do" instead of what, and I don't think we want that. I do believe however that
Re: Metrics for Beam IOs.
That's what I don't understand - why would we want that ? Taking on responsibilities in the "stack" should have a good reason. Someone choosing to run Beam on Spark/Flink/Apex would have to take care of installing those clusters, right ? perhaps providing them a resilient underlying FS ? and if he wants, setup monitoring (which even with the API proposed he'd have to do). I just don't see why it should be a part of the runner and/or Metrics API. On Sat, Feb 18, 2017 at 6:35 PM Jean-Baptiste Onofréwrote: > Good point. > > In Decanter, it's what I named a "scheduled collector". So, yes, the > adapter will periodically harvest metric to push. > > Regards > JB > > On 02/18/2017 05:30 PM, Amit Sela wrote: > > First issue with "push" metrics plugin - what if the runner's underlying > > reporting mechanism is "pull" ? Codahale ScheduledReporter will sample > the > > values every X and send to ... > > So any runner using a "pull-like" would use an adapter ? > > > > On Sat, Feb 18, 2017 at 6:27 PM Jean-Baptiste Onofré > > wrote: > > > >> Hi Ben, > >> > >> ok it's what I thought. Thanks for the clarification. > >> > >> +1 for the plugin-like "push" API (it's what I have in mind too ;)). > >> I will start a PoC for discussion next week. > >> > >> Regards > >> JB > >> > >> On 02/18/2017 05:17 PM, Ben Chambers wrote: > >>> The runner can already report metrics during pipeline execution so it > is > >>> usable for monitoring. > >>> > >>> The pipeline result can be used to query metrics during pipeline > >> execution, > >>> so a first version of reporting to other systems is to periodically > pulls > >>> metrics from the runner with that API. > >>> > >>> We may eventually want to provide a plugin-like API to get the runner > to > >>> push metrics more directly to other metrics stores. This layer needs > some > >>> thought since it has to handle the complexity of attempted/committed > >>> metrics to be consistent with the model. > >>> > >>> > >>> > >>> On Sat, Feb 18, 2017, 5:44 AM Jean-Baptiste Onofré > >> wrote: > >>> > >>> Hi Amit, > >>> > >>> before Beam, I didn't mind about portability ;) So I used the Spark > >>> approach. > >>> > >>> But, now, as a Beam user, I would expect a generic way to deal with > >>> metric whatever the runner would be. > >>> > >>> Today, you are right: I'm using the solution provided by the execution > >>> engine. That's the current approach and it works fine. And it's up to > me > >>> to leverage (for intance Accumulators) it with my own system. > >>> > >>> My thought is more to provide a generic way. It's only a discussion for > >>> now ;) > >>> > >>> Regards > >>> JB > >>> > >>> On 02/18/2017 02:38 PM, Amit Sela wrote: > On Sat, Feb 18, 2017 at 10:16 AM Jean-Baptiste Onofré < > j...@nanthrax.net> > wrote: > > > Hi Amit, > > > > my point is: how do we provide metric today to end user and how can > >> they > > use it to monitor a running pipeline ? > > > > Clearly the runner is involved, but, it should behave the same way > for > > all runners. Let me take an example. > > On my ecosystem, I'm using both Flink and Spark with Beam, some > > pipelines on each. I would like to get the metrics for all pipelines > to > > my monitoring backend. If I can "poll" from the execution engine > metric > > backend to my system that's acceptable, but it's an overhead of work. > > Having a generic metric reporting layer would allow us to have a more > > common way. If the user doesn't provide any reporting sink, then we > use > > the execution backend metric layer. If provided, we use the reporting > >>> sink. > > > How did you do it before Beam ? I that for Spark you reported it's > >> native > metrics via Codahale Reporter and Accumulators were visible in the UI, > >> and > the Spark runner took it a step forward to make it all visible via > Codahale. Assuming Flink does something similar, it all belongs to > >> runner > setup/configuration. > > > > > About your question: you are right, it's possible to update a > collector > > or appender without impacting anything else. > > > > Regards > > JB > > > > On 02/17/2017 10:38 PM, Amit Sela wrote: > >> @JB I think what you're suggesting is that Beam should provide a > >>> "Metrics > >> Reporting" API as well, and I used to think like you, but the more I > >> thought of that the more I tend to disagree now. > >> > >> The SDK is for users to author pipelines, so Metrics are for > >>> user-defined > >> metrics (in contrast to runner metrics). > >> > >> The Runner API is supposed to help different backends to integrate > >> with > >> Beam to allow users to execute those pipeline on their favourite > > backend. I > >> believe the Runner API has to provide restrictions/demands that are > >> just > >> enough so a runner
Re: Metrics for Beam IOs.
Is there a way to leverage runners' existing metrics sinks? As stated by Amit & Stas, Spark runner uses Spark's metrics sink to report Beam's aggregators and metrics. Other runners may also have a similar capability, I'm not sure. This could remove the need for a plugin, and dealing with push/pull. I'm assuming we should compile a table of what can be supported in each runner in this area and then decide a way to move forward? On Sat, Feb 18, 2017 at 6:35 PM Jean-Baptiste Onofréwrote: > Good point. > > In Decanter, it's what I named a "scheduled collector". So, yes, the > adapter will periodically harvest metric to push. > > Regards > JB > > On 02/18/2017 05:30 PM, Amit Sela wrote: > > First issue with "push" metrics plugin - what if the runner's underlying > > reporting mechanism is "pull" ? Codahale ScheduledReporter will sample > the > > values every X and send to ... > > So any runner using a "pull-like" would use an adapter ? > > > > On Sat, Feb 18, 2017 at 6:27 PM Jean-Baptiste Onofré > > wrote: > > > >> Hi Ben, > >> > >> ok it's what I thought. Thanks for the clarification. > >> > >> +1 for the plugin-like "push" API (it's what I have in mind too ;)). > >> I will start a PoC for discussion next week. > >> > >> Regards > >> JB > >> > >> On 02/18/2017 05:17 PM, Ben Chambers wrote: > >>> The runner can already report metrics during pipeline execution so it > is > >>> usable for monitoring. > >>> > >>> The pipeline result can be used to query metrics during pipeline > >> execution, > >>> so a first version of reporting to other systems is to periodically > pulls > >>> metrics from the runner with that API. > >>> > >>> We may eventually want to provide a plugin-like API to get the runner > to > >>> push metrics more directly to other metrics stores. This layer needs > some > >>> thought since it has to handle the complexity of attempted/committed > >>> metrics to be consistent with the model. > >>> > >>> > >>> > >>> On Sat, Feb 18, 2017, 5:44 AM Jean-Baptiste Onofré > >> wrote: > >>> > >>> Hi Amit, > >>> > >>> before Beam, I didn't mind about portability ;) So I used the Spark > >>> approach. > >>> > >>> But, now, as a Beam user, I would expect a generic way to deal with > >>> metric whatever the runner would be. > >>> > >>> Today, you are right: I'm using the solution provided by the execution > >>> engine. That's the current approach and it works fine. And it's up to > me > >>> to leverage (for intance Accumulators) it with my own system. > >>> > >>> My thought is more to provide a generic way. It's only a discussion for > >>> now ;) > >>> > >>> Regards > >>> JB > >>> > >>> On 02/18/2017 02:38 PM, Amit Sela wrote: > On Sat, Feb 18, 2017 at 10:16 AM Jean-Baptiste Onofré < > j...@nanthrax.net> > wrote: > > > Hi Amit, > > > > my point is: how do we provide metric today to end user and how can > >> they > > use it to monitor a running pipeline ? > > > > Clearly the runner is involved, but, it should behave the same way > for > > all runners. Let me take an example. > > On my ecosystem, I'm using both Flink and Spark with Beam, some > > pipelines on each. I would like to get the metrics for all pipelines > to > > my monitoring backend. If I can "poll" from the execution engine > metric > > backend to my system that's acceptable, but it's an overhead of work. > > Having a generic metric reporting layer would allow us to have a more > > common way. If the user doesn't provide any reporting sink, then we > use > > the execution backend metric layer. If provided, we use the reporting > >>> sink. > > > How did you do it before Beam ? I that for Spark you reported it's > >> native > metrics via Codahale Reporter and Accumulators were visible in the UI, > >> and > the Spark runner took it a step forward to make it all visible via > Codahale. Assuming Flink does something similar, it all belongs to > >> runner > setup/configuration. > > > > > About your question: you are right, it's possible to update a > collector > > or appender without impacting anything else. > > > > Regards > > JB > > > > On 02/17/2017 10:38 PM, Amit Sela wrote: > >> @JB I think what you're suggesting is that Beam should provide a > >>> "Metrics > >> Reporting" API as well, and I used to think like you, but the more I > >> thought of that the more I tend to disagree now. > >> > >> The SDK is for users to author pipelines, so Metrics are for > >>> user-defined > >> metrics (in contrast to runner metrics). > >> > >> The Runner API is supposed to help different backends to integrate > >> with > >> Beam to allow users to execute those pipeline on their favourite > > backend. I > >> believe the Runner API has to provide restrictions/demands that are > >> just > >> enough so a runner could execute a Beam
Re: Metrics for Beam IOs.
Good point. In Decanter, it's what I named a "scheduled collector". So, yes, the adapter will periodically harvest metric to push. Regards JB On 02/18/2017 05:30 PM, Amit Sela wrote: First issue with "push" metrics plugin - what if the runner's underlying reporting mechanism is "pull" ? Codahale ScheduledReporter will sample the values every X and send to ... So any runner using a "pull-like" would use an adapter ? On Sat, Feb 18, 2017 at 6:27 PM Jean-Baptiste Onofréwrote: Hi Ben, ok it's what I thought. Thanks for the clarification. +1 for the plugin-like "push" API (it's what I have in mind too ;)). I will start a PoC for discussion next week. Regards JB On 02/18/2017 05:17 PM, Ben Chambers wrote: The runner can already report metrics during pipeline execution so it is usable for monitoring. The pipeline result can be used to query metrics during pipeline execution, so a first version of reporting to other systems is to periodically pulls metrics from the runner with that API. We may eventually want to provide a plugin-like API to get the runner to push metrics more directly to other metrics stores. This layer needs some thought since it has to handle the complexity of attempted/committed metrics to be consistent with the model. On Sat, Feb 18, 2017, 5:44 AM Jean-Baptiste Onofré wrote: Hi Amit, before Beam, I didn't mind about portability ;) So I used the Spark approach. But, now, as a Beam user, I would expect a generic way to deal with metric whatever the runner would be. Today, you are right: I'm using the solution provided by the execution engine. That's the current approach and it works fine. And it's up to me to leverage (for intance Accumulators) it with my own system. My thought is more to provide a generic way. It's only a discussion for now ;) Regards JB On 02/18/2017 02:38 PM, Amit Sela wrote: On Sat, Feb 18, 2017 at 10:16 AM Jean-Baptiste Onofré wrote: Hi Amit, my point is: how do we provide metric today to end user and how can they use it to monitor a running pipeline ? Clearly the runner is involved, but, it should behave the same way for all runners. Let me take an example. On my ecosystem, I'm using both Flink and Spark with Beam, some pipelines on each. I would like to get the metrics for all pipelines to my monitoring backend. If I can "poll" from the execution engine metric backend to my system that's acceptable, but it's an overhead of work. Having a generic metric reporting layer would allow us to have a more common way. If the user doesn't provide any reporting sink, then we use the execution backend metric layer. If provided, we use the reporting sink. How did you do it before Beam ? I that for Spark you reported it's native metrics via Codahale Reporter and Accumulators were visible in the UI, and the Spark runner took it a step forward to make it all visible via Codahale. Assuming Flink does something similar, it all belongs to runner setup/configuration. About your question: you are right, it's possible to update a collector or appender without impacting anything else. Regards JB On 02/17/2017 10:38 PM, Amit Sela wrote: @JB I think what you're suggesting is that Beam should provide a "Metrics Reporting" API as well, and I used to think like you, but the more I thought of that the more I tend to disagree now. The SDK is for users to author pipelines, so Metrics are for user-defined metrics (in contrast to runner metrics). The Runner API is supposed to help different backends to integrate with Beam to allow users to execute those pipeline on their favourite backend. I believe the Runner API has to provide restrictions/demands that are just enough so a runner could execute a Beam pipeline as best it can, and I'm afraid that this would demand runner authors to do work that is unnecessary. This is also sort of "crossing the line" into the runner's domain and "telling it how to do" instead of what, and I don't think we want that. I do believe however that runner's should integrate the Metrics into their own metrics reporting system - but that's for the runner author to decide. Stas did this for the Spark runner because Spark doesn't report back user-defined Accumulators (Spark's Aggregators) to it's Metrics system. On a curious note though, did you use an OSGi service per event-type ? so you can upgrade specific event-handlers without taking down the entire reporter ? but that's really unrelated to this thread :-) . On Fri, Feb 17, 2017 at 8:36 PM Ben Chambers wrote: It don't think it is possible for there to be a general mechanism for pushing metrics out during the execution of a pipeline. The Metrics API suggests that metrics should be reported as values across all attempts and values across only successful attempts. The latter requires runner involvement to ensure that a given metric value is
Re: Metrics for Beam IOs.
First issue with "push" metrics plugin - what if the runner's underlying reporting mechanism is "pull" ? Codahale ScheduledReporter will sample the values every X and send to ... So any runner using a "pull-like" would use an adapter ? On Sat, Feb 18, 2017 at 6:27 PM Jean-Baptiste Onofréwrote: > Hi Ben, > > ok it's what I thought. Thanks for the clarification. > > +1 for the plugin-like "push" API (it's what I have in mind too ;)). > I will start a PoC for discussion next week. > > Regards > JB > > On 02/18/2017 05:17 PM, Ben Chambers wrote: > > The runner can already report metrics during pipeline execution so it is > > usable for monitoring. > > > > The pipeline result can be used to query metrics during pipeline > execution, > > so a first version of reporting to other systems is to periodically pulls > > metrics from the runner with that API. > > > > We may eventually want to provide a plugin-like API to get the runner to > > push metrics more directly to other metrics stores. This layer needs some > > thought since it has to handle the complexity of attempted/committed > > metrics to be consistent with the model. > > > > > > > > On Sat, Feb 18, 2017, 5:44 AM Jean-Baptiste Onofré > wrote: > > > > Hi Amit, > > > > before Beam, I didn't mind about portability ;) So I used the Spark > > approach. > > > > But, now, as a Beam user, I would expect a generic way to deal with > > metric whatever the runner would be. > > > > Today, you are right: I'm using the solution provided by the execution > > engine. That's the current approach and it works fine. And it's up to me > > to leverage (for intance Accumulators) it with my own system. > > > > My thought is more to provide a generic way. It's only a discussion for > > now ;) > > > > Regards > > JB > > > > On 02/18/2017 02:38 PM, Amit Sela wrote: > >> On Sat, Feb 18, 2017 at 10:16 AM Jean-Baptiste Onofré > >> wrote: > >> > >>> Hi Amit, > >>> > >>> my point is: how do we provide metric today to end user and how can > they > >>> use it to monitor a running pipeline ? > >>> > >>> Clearly the runner is involved, but, it should behave the same way for > >>> all runners. Let me take an example. > >>> On my ecosystem, I'm using both Flink and Spark with Beam, some > >>> pipelines on each. I would like to get the metrics for all pipelines to > >>> my monitoring backend. If I can "poll" from the execution engine metric > >>> backend to my system that's acceptable, but it's an overhead of work. > >>> Having a generic metric reporting layer would allow us to have a more > >>> common way. If the user doesn't provide any reporting sink, then we use > >>> the execution backend metric layer. If provided, we use the reporting > > sink. > >>> > >> How did you do it before Beam ? I that for Spark you reported it's > native > >> metrics via Codahale Reporter and Accumulators were visible in the UI, > and > >> the Spark runner took it a step forward to make it all visible via > >> Codahale. Assuming Flink does something similar, it all belongs to > runner > >> setup/configuration. > >> > >>> > >>> About your question: you are right, it's possible to update a collector > >>> or appender without impacting anything else. > >>> > >>> Regards > >>> JB > >>> > >>> On 02/17/2017 10:38 PM, Amit Sela wrote: > @JB I think what you're suggesting is that Beam should provide a > > "Metrics > Reporting" API as well, and I used to think like you, but the more I > thought of that the more I tend to disagree now. > > The SDK is for users to author pipelines, so Metrics are for > > user-defined > metrics (in contrast to runner metrics). > > The Runner API is supposed to help different backends to integrate > with > Beam to allow users to execute those pipeline on their favourite > >>> backend. I > believe the Runner API has to provide restrictions/demands that are > just > enough so a runner could execute a Beam pipeline as best it can, and > I'm > afraid that this would demand runner authors to do work that is > >>> unnecessary. > This is also sort of "crossing the line" into the runner's domain and > "telling it how to do" instead of what, and I don't think we want > that. > > I do believe however that runner's should integrate the Metrics into > >>> their > own metrics reporting system - but that's for the runner author to > >>> decide. > Stas did this for the Spark runner because Spark doesn't report back > user-defined Accumulators (Spark's Aggregators) to it's Metrics > system. > > On a curious note though, did you use an OSGi service per event-type ? > > so > you can upgrade specific event-handlers without taking down the entire > reporter ? but that's really unrelated to this thread :-) . > > > > On Fri, Feb 17, 2017 at 8:36 PM Ben Chambers > >>> > wrote: >
Re: Metrics for Beam IOs.
Hi Ben, ok it's what I thought. Thanks for the clarification. +1 for the plugin-like "push" API (it's what I have in mind too ;)). I will start a PoC for discussion next week. Regards JB On 02/18/2017 05:17 PM, Ben Chambers wrote: The runner can already report metrics during pipeline execution so it is usable for monitoring. The pipeline result can be used to query metrics during pipeline execution, so a first version of reporting to other systems is to periodically pulls metrics from the runner with that API. We may eventually want to provide a plugin-like API to get the runner to push metrics more directly to other metrics stores. This layer needs some thought since it has to handle the complexity of attempted/committed metrics to be consistent with the model. On Sat, Feb 18, 2017, 5:44 AM Jean-Baptiste Onofréwrote: Hi Amit, before Beam, I didn't mind about portability ;) So I used the Spark approach. But, now, as a Beam user, I would expect a generic way to deal with metric whatever the runner would be. Today, you are right: I'm using the solution provided by the execution engine. That's the current approach and it works fine. And it's up to me to leverage (for intance Accumulators) it with my own system. My thought is more to provide a generic way. It's only a discussion for now ;) Regards JB On 02/18/2017 02:38 PM, Amit Sela wrote: On Sat, Feb 18, 2017 at 10:16 AM Jean-Baptiste Onofré wrote: Hi Amit, my point is: how do we provide metric today to end user and how can they use it to monitor a running pipeline ? Clearly the runner is involved, but, it should behave the same way for all runners. Let me take an example. On my ecosystem, I'm using both Flink and Spark with Beam, some pipelines on each. I would like to get the metrics for all pipelines to my monitoring backend. If I can "poll" from the execution engine metric backend to my system that's acceptable, but it's an overhead of work. Having a generic metric reporting layer would allow us to have a more common way. If the user doesn't provide any reporting sink, then we use the execution backend metric layer. If provided, we use the reporting sink. How did you do it before Beam ? I that for Spark you reported it's native metrics via Codahale Reporter and Accumulators were visible in the UI, and the Spark runner took it a step forward to make it all visible via Codahale. Assuming Flink does something similar, it all belongs to runner setup/configuration. About your question: you are right, it's possible to update a collector or appender without impacting anything else. Regards JB On 02/17/2017 10:38 PM, Amit Sela wrote: @JB I think what you're suggesting is that Beam should provide a "Metrics Reporting" API as well, and I used to think like you, but the more I thought of that the more I tend to disagree now. The SDK is for users to author pipelines, so Metrics are for user-defined metrics (in contrast to runner metrics). The Runner API is supposed to help different backends to integrate with Beam to allow users to execute those pipeline on their favourite backend. I believe the Runner API has to provide restrictions/demands that are just enough so a runner could execute a Beam pipeline as best it can, and I'm afraid that this would demand runner authors to do work that is unnecessary. This is also sort of "crossing the line" into the runner's domain and "telling it how to do" instead of what, and I don't think we want that. I do believe however that runner's should integrate the Metrics into their own metrics reporting system - but that's for the runner author to decide. Stas did this for the Spark runner because Spark doesn't report back user-defined Accumulators (Spark's Aggregators) to it's Metrics system. On a curious note though, did you use an OSGi service per event-type ? so you can upgrade specific event-handlers without taking down the entire reporter ? but that's really unrelated to this thread :-) . On Fri, Feb 17, 2017 at 8:36 PM Ben Chambers wrote: It don't think it is possible for there to be a general mechanism for pushing metrics out during the execution of a pipeline. The Metrics API suggests that metrics should be reported as values across all attempts and values across only successful attempts. The latter requires runner involvement to ensure that a given metric value is atomically incremented (or checkpointed) when the bundle it was reported in is committed. Aviem has already implemented Metrics support for the Spark runner. I am working on support for the Dataflow runner. On Fri, Feb 17, 2017 at 7:50 AM Jean-Baptiste Onofré wrote: Hi guys, As I'm back from vacation, I'm back on this topic ;) It's a great discussion, and I think about the Metric IO coverage, it's good. However, there's a point that we discussed very fast in the thread and I think it's an important
Re: Metrics for Beam IOs.
The runner can already report metrics during pipeline execution so it is usable for monitoring. The pipeline result can be used to query metrics during pipeline execution, so a first version of reporting to other systems is to periodically pulls metrics from the runner with that API. We may eventually want to provide a plugin-like API to get the runner to push metrics more directly to other metrics stores. This layer needs some thought since it has to handle the complexity of attempted/committed metrics to be consistent with the model. On Sat, Feb 18, 2017, 5:44 AM Jean-Baptiste Onofréwrote: Hi Amit, before Beam, I didn't mind about portability ;) So I used the Spark approach. But, now, as a Beam user, I would expect a generic way to deal with metric whatever the runner would be. Today, you are right: I'm using the solution provided by the execution engine. That's the current approach and it works fine. And it's up to me to leverage (for intance Accumulators) it with my own system. My thought is more to provide a generic way. It's only a discussion for now ;) Regards JB On 02/18/2017 02:38 PM, Amit Sela wrote: > On Sat, Feb 18, 2017 at 10:16 AM Jean-Baptiste Onofré > wrote: > >> Hi Amit, >> >> my point is: how do we provide metric today to end user and how can they >> use it to monitor a running pipeline ? >> >> Clearly the runner is involved, but, it should behave the same way for >> all runners. Let me take an example. >> On my ecosystem, I'm using both Flink and Spark with Beam, some >> pipelines on each. I would like to get the metrics for all pipelines to >> my monitoring backend. If I can "poll" from the execution engine metric >> backend to my system that's acceptable, but it's an overhead of work. >> Having a generic metric reporting layer would allow us to have a more >> common way. If the user doesn't provide any reporting sink, then we use >> the execution backend metric layer. If provided, we use the reporting sink. >> > How did you do it before Beam ? I that for Spark you reported it's native > metrics via Codahale Reporter and Accumulators were visible in the UI, and > the Spark runner took it a step forward to make it all visible via > Codahale. Assuming Flink does something similar, it all belongs to runner > setup/configuration. > >> >> About your question: you are right, it's possible to update a collector >> or appender without impacting anything else. >> >> Regards >> JB >> >> On 02/17/2017 10:38 PM, Amit Sela wrote: >>> @JB I think what you're suggesting is that Beam should provide a "Metrics >>> Reporting" API as well, and I used to think like you, but the more I >>> thought of that the more I tend to disagree now. >>> >>> The SDK is for users to author pipelines, so Metrics are for user-defined >>> metrics (in contrast to runner metrics). >>> >>> The Runner API is supposed to help different backends to integrate with >>> Beam to allow users to execute those pipeline on their favourite >> backend. I >>> believe the Runner API has to provide restrictions/demands that are just >>> enough so a runner could execute a Beam pipeline as best it can, and I'm >>> afraid that this would demand runner authors to do work that is >> unnecessary. >>> This is also sort of "crossing the line" into the runner's domain and >>> "telling it how to do" instead of what, and I don't think we want that. >>> >>> I do believe however that runner's should integrate the Metrics into >> their >>> own metrics reporting system - but that's for the runner author to >> decide. >>> Stas did this for the Spark runner because Spark doesn't report back >>> user-defined Accumulators (Spark's Aggregators) to it's Metrics system. >>> >>> On a curious note though, did you use an OSGi service per event-type ? so >>> you can upgrade specific event-handlers without taking down the entire >>> reporter ? but that's really unrelated to this thread :-) . >>> >>> >>> >>> On Fri, Feb 17, 2017 at 8:36 PM Ben Chambers >> >>> wrote: >>> It don't think it is possible for there to be a general mechanism for pushing metrics out during the execution of a pipeline. The Metrics API suggests that metrics should be reported as values across all attempts >> and values across only successful attempts. The latter requires runner involvement to ensure that a given metric value is atomically >> incremented (or checkpointed) when the bundle it was reported in is committed. Aviem has already implemented Metrics support for the Spark runner. I am working on support for the Dataflow runner. On Fri, Feb 17, 2017 at 7:50 AM Jean-Baptiste Onofré wrote: Hi guys, As I'm back from vacation, I'm back on this topic ;) It's a great discussion, and I think about the Metric IO coverage, it's good. However, there's a point that we discussed very fast in the
Re: Metrics for Beam IOs.
Hi Amit, before Beam, I didn't mind about portability ;) So I used the Spark approach. But, now, as a Beam user, I would expect a generic way to deal with metric whatever the runner would be. Today, you are right: I'm using the solution provided by the execution engine. That's the current approach and it works fine. And it's up to me to leverage (for intance Accumulators) it with my own system. My thought is more to provide a generic way. It's only a discussion for now ;) Regards JB On 02/18/2017 02:38 PM, Amit Sela wrote: On Sat, Feb 18, 2017 at 10:16 AM Jean-Baptiste Onofréwrote: Hi Amit, my point is: how do we provide metric today to end user and how can they use it to monitor a running pipeline ? Clearly the runner is involved, but, it should behave the same way for all runners. Let me take an example. On my ecosystem, I'm using both Flink and Spark with Beam, some pipelines on each. I would like to get the metrics for all pipelines to my monitoring backend. If I can "poll" from the execution engine metric backend to my system that's acceptable, but it's an overhead of work. Having a generic metric reporting layer would allow us to have a more common way. If the user doesn't provide any reporting sink, then we use the execution backend metric layer. If provided, we use the reporting sink. How did you do it before Beam ? I that for Spark you reported it's native metrics via Codahale Reporter and Accumulators were visible in the UI, and the Spark runner took it a step forward to make it all visible via Codahale. Assuming Flink does something similar, it all belongs to runner setup/configuration. About your question: you are right, it's possible to update a collector or appender without impacting anything else. Regards JB On 02/17/2017 10:38 PM, Amit Sela wrote: @JB I think what you're suggesting is that Beam should provide a "Metrics Reporting" API as well, and I used to think like you, but the more I thought of that the more I tend to disagree now. The SDK is for users to author pipelines, so Metrics are for user-defined metrics (in contrast to runner metrics). The Runner API is supposed to help different backends to integrate with Beam to allow users to execute those pipeline on their favourite backend. I believe the Runner API has to provide restrictions/demands that are just enough so a runner could execute a Beam pipeline as best it can, and I'm afraid that this would demand runner authors to do work that is unnecessary. This is also sort of "crossing the line" into the runner's domain and "telling it how to do" instead of what, and I don't think we want that. I do believe however that runner's should integrate the Metrics into their own metrics reporting system - but that's for the runner author to decide. Stas did this for the Spark runner because Spark doesn't report back user-defined Accumulators (Spark's Aggregators) to it's Metrics system. On a curious note though, did you use an OSGi service per event-type ? so you can upgrade specific event-handlers without taking down the entire reporter ? but that's really unrelated to this thread :-) . On Fri, Feb 17, 2017 at 8:36 PM Ben Chambers wrote: It don't think it is possible for there to be a general mechanism for pushing metrics out during the execution of a pipeline. The Metrics API suggests that metrics should be reported as values across all attempts and values across only successful attempts. The latter requires runner involvement to ensure that a given metric value is atomically incremented (or checkpointed) when the bundle it was reported in is committed. Aviem has already implemented Metrics support for the Spark runner. I am working on support for the Dataflow runner. On Fri, Feb 17, 2017 at 7:50 AM Jean-Baptiste Onofré wrote: Hi guys, As I'm back from vacation, I'm back on this topic ;) It's a great discussion, and I think about the Metric IO coverage, it's good. However, there's a point that we discussed very fast in the thread and I think it's an important one (maybe more important than the provided metrics actually in term of roadmap ;)). Assuming we have pipelines, PTransforms, IOs, ... using the Metric API, how do we expose the metrics for the end-users ? A first approach would be to bind a JMX MBean server by the pipeline and expose the metrics via MBeans. I don't think it's a good idea for the following reasons: 1. It's not easy to know where the pipeline is actually executed, and so, not easy to find the MBean server URI. 2. For the same reason, we can have port binding error. 3. If it could work for unbounded/streaming pipelines (as they are always "running"), it's not really applicable for bounded/batch pipelines as their lifetime is "limited" ;) So, I think the "push" approach is better: during the execution, a pipeline "internally" collects and pushes the metric to a backend. The "push" could a
Re: Metrics for Beam IOs.
JB, I think you raise a good point. As a user, if I'm sold on pipeline portability, I would expect the deal to include metrics as well as. Otherwise, being able to port a pipeline to a different execution engine at the cost of "flying blind" might not be as tempting. If we wish to give users a consistent set of metrics (which is totally +1 IMHO), even before we decide on what such a metrics set should include, there seem to be at least 2 ways to achieve this: 1. Hard spec, via API: have the SDK define metric specs for runners/IOs to implement 2. Soft spec, via documentation: specify the expected metrics in docs, some of the metrics could be "must have" while others could be "nice to have", and let runners implement (interpret?) what they can, and how they can. Naturally, each of these has its pros and cons. Hard spec is great for consistency, but may make things harder on the implementors due to the efforts required to align the internals of each execution engine to the SDK (e.g., attempted vs. committed metrics, as discussed in a separate thread here on the dev list) Soft spec is enforced by doc rather than by cod and leaves consistency up to disincline (meh...), but may make it easier on implementors to get things done as they have more freedom. -Stas On Sat, Feb 18, 2017 at 10:16 AM Jean-Baptiste Onofréwrote: > Hi Amit, > > my point is: how do we provide metric today to end user and how can they > use it to monitor a running pipeline ? > > Clearly the runner is involved, but, it should behave the same way for > all runners. Let me take an example. > On my ecosystem, I'm using both Flink and Spark with Beam, some > pipelines on each. I would like to get the metrics for all pipelines to > my monitoring backend. If I can "poll" from the execution engine metric > backend to my system that's acceptable, but it's an overhead of work. > Having a generic metric reporting layer would allow us to have a more > common way. If the user doesn't provide any reporting sink, then we use > the execution backend metric layer. If provided, we use the reporting sink. > > About your question: you are right, it's possible to update a collector > or appender without impacting anything else. > > Regards > JB > > On 02/17/2017 10:38 PM, Amit Sela wrote: > > @JB I think what you're suggesting is that Beam should provide a "Metrics > > Reporting" API as well, and I used to think like you, but the more I > > thought of that the more I tend to disagree now. > > > > The SDK is for users to author pipelines, so Metrics are for user-defined > > metrics (in contrast to runner metrics). > > > > The Runner API is supposed to help different backends to integrate with > > Beam to allow users to execute those pipeline on their favourite > backend. I > > believe the Runner API has to provide restrictions/demands that are just > > enough so a runner could execute a Beam pipeline as best it can, and I'm > > afraid that this would demand runner authors to do work that is > unnecessary. > > This is also sort of "crossing the line" into the runner's domain and > > "telling it how to do" instead of what, and I don't think we want that. > > > > I do believe however that runner's should integrate the Metrics into > their > > own metrics reporting system - but that's for the runner author to > decide. > > Stas did this for the Spark runner because Spark doesn't report back > > user-defined Accumulators (Spark's Aggregators) to it's Metrics system. > > > > On a curious note though, did you use an OSGi service per event-type ? so > > you can upgrade specific event-handlers without taking down the entire > > reporter ? but that's really unrelated to this thread :-) . > > > > > > > > On Fri, Feb 17, 2017 at 8:36 PM Ben Chambers > > > wrote: > > > >> It don't think it is possible for there to be a general mechanism for > >> pushing metrics out during the execution of a pipeline. The Metrics API > >> suggests that metrics should be reported as values across all attempts > and > >> values across only successful attempts. The latter requires runner > >> involvement to ensure that a given metric value is atomically > incremented > >> (or checkpointed) when the bundle it was reported in is committed. > >> > >> Aviem has already implemented Metrics support for the Spark runner. I am > >> working on support for the Dataflow runner. > >> > >> On Fri, Feb 17, 2017 at 7:50 AM Jean-Baptiste Onofré > >> wrote: > >> > >> Hi guys, > >> > >> As I'm back from vacation, I'm back on this topic ;) > >> > >> It's a great discussion, and I think about the Metric IO coverage, it's > >> good. > >> > >> However, there's a point that we discussed very fast in the thread and I > >> think it's an important one (maybe more important than the provided > >> metrics actually in term of roadmap ;)). > >> > >> Assuming we have pipelines, PTransforms, IOs, ... using the Metric API, > >> how
Re: Metrics for Beam IOs.
Hi Amit, my point is: how do we provide metric today to end user and how can they use it to monitor a running pipeline ? Clearly the runner is involved, but, it should behave the same way for all runners. Let me take an example. On my ecosystem, I'm using both Flink and Spark with Beam, some pipelines on each. I would like to get the metrics for all pipelines to my monitoring backend. If I can "poll" from the execution engine metric backend to my system that's acceptable, but it's an overhead of work. Having a generic metric reporting layer would allow us to have a more common way. If the user doesn't provide any reporting sink, then we use the execution backend metric layer. If provided, we use the reporting sink. About your question: you are right, it's possible to update a collector or appender without impacting anything else. Regards JB On 02/17/2017 10:38 PM, Amit Sela wrote: @JB I think what you're suggesting is that Beam should provide a "Metrics Reporting" API as well, and I used to think like you, but the more I thought of that the more I tend to disagree now. The SDK is for users to author pipelines, so Metrics are for user-defined metrics (in contrast to runner metrics). The Runner API is supposed to help different backends to integrate with Beam to allow users to execute those pipeline on their favourite backend. I believe the Runner API has to provide restrictions/demands that are just enough so a runner could execute a Beam pipeline as best it can, and I'm afraid that this would demand runner authors to do work that is unnecessary. This is also sort of "crossing the line" into the runner's domain and "telling it how to do" instead of what, and I don't think we want that. I do believe however that runner's should integrate the Metrics into their own metrics reporting system - but that's for the runner author to decide. Stas did this for the Spark runner because Spark doesn't report back user-defined Accumulators (Spark's Aggregators) to it's Metrics system. On a curious note though, did you use an OSGi service per event-type ? so you can upgrade specific event-handlers without taking down the entire reporter ? but that's really unrelated to this thread :-) . On Fri, Feb 17, 2017 at 8:36 PM Ben Chamberswrote: It don't think it is possible for there to be a general mechanism for pushing metrics out during the execution of a pipeline. The Metrics API suggests that metrics should be reported as values across all attempts and values across only successful attempts. The latter requires runner involvement to ensure that a given metric value is atomically incremented (or checkpointed) when the bundle it was reported in is committed. Aviem has already implemented Metrics support for the Spark runner. I am working on support for the Dataflow runner. On Fri, Feb 17, 2017 at 7:50 AM Jean-Baptiste Onofré wrote: Hi guys, As I'm back from vacation, I'm back on this topic ;) It's a great discussion, and I think about the Metric IO coverage, it's good. However, there's a point that we discussed very fast in the thread and I think it's an important one (maybe more important than the provided metrics actually in term of roadmap ;)). Assuming we have pipelines, PTransforms, IOs, ... using the Metric API, how do we expose the metrics for the end-users ? A first approach would be to bind a JMX MBean server by the pipeline and expose the metrics via MBeans. I don't think it's a good idea for the following reasons: 1. It's not easy to know where the pipeline is actually executed, and so, not easy to find the MBean server URI. 2. For the same reason, we can have port binding error. 3. If it could work for unbounded/streaming pipelines (as they are always "running"), it's not really applicable for bounded/batch pipelines as their lifetime is "limited" ;) So, I think the "push" approach is better: during the execution, a pipeline "internally" collects and pushes the metric to a backend. The "push" could a kind of sink. For instance, the metric "records" can be sent to a Kafka topic, or directly to Elasticsearch or whatever. The metric backend can deal with alerting, reporting, etc. Basically, we have to define two things: 1. The "appender" where the metrics have to be sent (and the corresponding configuration to connect, like Kafka or Elasticsearch location) 2. The format of the metric data (for instance, json format). In Apache Karaf, I created something similar named Decanter: http://blog.nanthrax.net/2015/07/monitoring-and-alerting-with-apache-karaf-decanter/ http://karaf.apache.org/manual/decanter/latest-1/ Decanter provides collectors that harvest the metrics (like JMX MBean attributes, log messages, ...). Basically, for Beam, it would be directly the Metric API used by pipeline parts. Then, the metric record are send to a dispatcher which send the metric records to an appender. The appenders store or send the metric
Re: Metrics for Beam IOs.
@JB I think what you're suggesting is that Beam should provide a "Metrics Reporting" API as well, and I used to think like you, but the more I thought of that the more I tend to disagree now. The SDK is for users to author pipelines, so Metrics are for user-defined metrics (in contrast to runner metrics). The Runner API is supposed to help different backends to integrate with Beam to allow users to execute those pipeline on their favourite backend. I believe the Runner API has to provide restrictions/demands that are just enough so a runner could execute a Beam pipeline as best it can, and I'm afraid that this would demand runner authors to do work that is unnecessary. This is also sort of "crossing the line" into the runner's domain and "telling it how to do" instead of what, and I don't think we want that. I do believe however that runner's should integrate the Metrics into their own metrics reporting system - but that's for the runner author to decide. Stas did this for the Spark runner because Spark doesn't report back user-defined Accumulators (Spark's Aggregators) to it's Metrics system. On a curious note though, did you use an OSGi service per event-type ? so you can upgrade specific event-handlers without taking down the entire reporter ? but that's really unrelated to this thread :-) . On Fri, Feb 17, 2017 at 8:36 PM Ben Chamberswrote: > It don't think it is possible for there to be a general mechanism for > pushing metrics out during the execution of a pipeline. The Metrics API > suggests that metrics should be reported as values across all attempts and > values across only successful attempts. The latter requires runner > involvement to ensure that a given metric value is atomically incremented > (or checkpointed) when the bundle it was reported in is committed. > > Aviem has already implemented Metrics support for the Spark runner. I am > working on support for the Dataflow runner. > > On Fri, Feb 17, 2017 at 7:50 AM Jean-Baptiste Onofré > wrote: > > Hi guys, > > As I'm back from vacation, I'm back on this topic ;) > > It's a great discussion, and I think about the Metric IO coverage, it's > good. > > However, there's a point that we discussed very fast in the thread and I > think it's an important one (maybe more important than the provided > metrics actually in term of roadmap ;)). > > Assuming we have pipelines, PTransforms, IOs, ... using the Metric API, > how do we expose the metrics for the end-users ? > > A first approach would be to bind a JMX MBean server by the pipeline and > expose the metrics via MBeans. I don't think it's a good idea for the > following reasons: > 1. It's not easy to know where the pipeline is actually executed, and > so, not easy to find the MBean server URI. > 2. For the same reason, we can have port binding error. > 3. If it could work for unbounded/streaming pipelines (as they are > always "running"), it's not really applicable for bounded/batch > pipelines as their lifetime is "limited" ;) > > So, I think the "push" approach is better: during the execution, a > pipeline "internally" collects and pushes the metric to a backend. > The "push" could a kind of sink. For instance, the metric "records" can > be sent to a Kafka topic, or directly to Elasticsearch or whatever. > The metric backend can deal with alerting, reporting, etc. > > Basically, we have to define two things: > 1. The "appender" where the metrics have to be sent (and the > corresponding configuration to connect, like Kafka or Elasticsearch > location) > 2. The format of the metric data (for instance, json format). > > In Apache Karaf, I created something similar named Decanter: > > > http://blog.nanthrax.net/2015/07/monitoring-and-alerting-with-apache-karaf-decanter/ > > http://karaf.apache.org/manual/decanter/latest-1/ > > Decanter provides collectors that harvest the metrics (like JMX MBean > attributes, log messages, ...). Basically, for Beam, it would be > directly the Metric API used by pipeline parts. > Then, the metric record are send to a dispatcher which send the metric > records to an appender. The appenders store or send the metric records > to a backend (elasticsearc, cassandra, kafka, jms, reddis, ...). > > I think it would make sense to provide the configuration and Metric > "appender" via the pipeline options. > As it's not really runner specific, it could be part of the metric API > (or SPI in that case). > > WDYT ? > > Regards > JB > > On 02/15/2017 09:22 AM, Stas Levin wrote: > > +1 to making the IO metrics (e.g. producers, consumers) available as part > > of the Beam pipeline metrics tree for debugging and visibility. > > > > As it has already been mentioned, many IO clients have a metrics > mechanism > > in place, so in these cases I think it could be beneficial to mirror > their > > metrics under the relevant subtree of the Beam metrics tree. > > > > On Wed, Feb 15, 2017 at 12:04 AM Amit Sela
Re: Metrics for Beam IOs.
Hi guys, As I'm back from vacation, I'm back on this topic ;) It's a great discussion, and I think about the Metric IO coverage, it's good. However, there's a point that we discussed very fast in the thread and I think it's an important one (maybe more important than the provided metrics actually in term of roadmap ;)). Assuming we have pipelines, PTransforms, IOs, ... using the Metric API, how do we expose the metrics for the end-users ? A first approach would be to bind a JMX MBean server by the pipeline and expose the metrics via MBeans. I don't think it's a good idea for the following reasons: 1. It's not easy to know where the pipeline is actually executed, and so, not easy to find the MBean server URI. 2. For the same reason, we can have port binding error. 3. If it could work for unbounded/streaming pipelines (as they are always "running"), it's not really applicable for bounded/batch pipelines as their lifetime is "limited" ;) So, I think the "push" approach is better: during the execution, a pipeline "internally" collects and pushes the metric to a backend. The "push" could a kind of sink. For instance, the metric "records" can be sent to a Kafka topic, or directly to Elasticsearch or whatever. The metric backend can deal with alerting, reporting, etc. Basically, we have to define two things: 1. The "appender" where the metrics have to be sent (and the corresponding configuration to connect, like Kafka or Elasticsearch location) 2. The format of the metric data (for instance, json format). In Apache Karaf, I created something similar named Decanter: http://blog.nanthrax.net/2015/07/monitoring-and-alerting-with-apache-karaf-decanter/ http://karaf.apache.org/manual/decanter/latest-1/ Decanter provides collectors that harvest the metrics (like JMX MBean attributes, log messages, ...). Basically, for Beam, it would be directly the Metric API used by pipeline parts. Then, the metric record are send to a dispatcher which send the metric records to an appender. The appenders store or send the metric records to a backend (elasticsearc, cassandra, kafka, jms, reddis, ...). I think it would make sense to provide the configuration and Metric "appender" via the pipeline options. As it's not really runner specific, it could be part of the metric API (or SPI in that case). WDYT ? Regards JB On 02/15/2017 09:22 AM, Stas Levin wrote: +1 to making the IO metrics (e.g. producers, consumers) available as part of the Beam pipeline metrics tree for debugging and visibility. As it has already been mentioned, many IO clients have a metrics mechanism in place, so in these cases I think it could be beneficial to mirror their metrics under the relevant subtree of the Beam metrics tree. On Wed, Feb 15, 2017 at 12:04 AM Amit Selawrote: I think this is a great discussion and I'd like to relate to some of the points raised here, and raise some of my own. First of all I think we should be careful here not to cross boundaries. IOs naturally have many metrics, and Beam should avoid "taking over" those. IO metrics should focus on what's relevant to the Pipeline: input/output rate, backlog (for UnboundedSources, which exists in bytes but for monitoring purposes we might want to consider #messages). I don't agree that we should not invest in doing this in Sources/Sinks and going directly to SplittableDoFn because the IO API is familiar and known, and as long as we keep it should be treated as a first class citizen. As for enable/disable - if IOs consider focusing on pipeline-related metrics I think we should be fine, though this could also change between runners as well. Finally, considering "split-metrics" is interesting because on one hand it affects the pipeline directly (unbalanced partitions in Kafka that may cause backlog) but this is that fine-line of responsibilities (Kafka monitoring would probably be able to tell you that partitions are not balanced). My 2 cents, cheers! On Tue, Feb 14, 2017 at 8:46 PM Raghu Angadi wrote: On Tue, Feb 14, 2017 at 9:21 AM, Ben Chambers
Re: Metrics for Beam IOs.
+1 to making the IO metrics (e.g. producers, consumers) available as part of the Beam pipeline metrics tree for debugging and visibility. As it has already been mentioned, many IO clients have a metrics mechanism in place, so in these cases I think it could be beneficial to mirror their metrics under the relevant subtree of the Beam metrics tree. On Wed, Feb 15, 2017 at 12:04 AM Amit Selawrote: > I think this is a great discussion and I'd like to relate to some of the > points raised here, and raise some of my own. > > First of all I think we should be careful here not to cross boundaries. IOs > naturally have many metrics, and Beam should avoid "taking over" those. IO > metrics should focus on what's relevant to the Pipeline: input/output rate, > backlog (for UnboundedSources, which exists in bytes but for monitoring > purposes we might want to consider #messages). > > I don't agree that we should not invest in doing this in Sources/Sinks and > going directly to SplittableDoFn because the IO API is familiar and known, > and as long as we keep it should be treated as a first class citizen. > > As for enable/disable - if IOs consider focusing on pipeline-related > metrics I think we should be fine, though this could also change between > runners as well. > > Finally, considering "split-metrics" is interesting because on one hand it > affects the pipeline directly (unbalanced partitions in Kafka that may > cause backlog) but this is that fine-line of responsibilities (Kafka > monitoring would probably be able to tell you that partitions are not > balanced). > > My 2 cents, cheers! > > On Tue, Feb 14, 2017 at 8:46 PM Raghu Angadi > wrote: > > > On Tue, Feb 14, 2017 at 9:21 AM, Ben Chambers > > > > > wrote: > > > > > > > > > * I also think there are data source specific metrics that a given IO > > > will > > > > want to expose (ie, things like kafka backlog for a topic.) > > > > > > UnboundedSource has API for backlog. It is better for beam/runners to > > handle backlog as well. > > Of course there will be some source specific metrics too (errors, i/o ops > > etc). > > >
Re: Metrics for Beam IOs.
I think this is a great discussion and I'd like to relate to some of the points raised here, and raise some of my own. First of all I think we should be careful here not to cross boundaries. IOs naturally have many metrics, and Beam should avoid "taking over" those. IO metrics should focus on what's relevant to the Pipeline: input/output rate, backlog (for UnboundedSources, which exists in bytes but for monitoring purposes we might want to consider #messages). I don't agree that we should not invest in doing this in Sources/Sinks and going directly to SplittableDoFn because the IO API is familiar and known, and as long as we keep it should be treated as a first class citizen. As for enable/disable - if IOs consider focusing on pipeline-related metrics I think we should be fine, though this could also change between runners as well. Finally, considering "split-metrics" is interesting because on one hand it affects the pipeline directly (unbalanced partitions in Kafka that may cause backlog) but this is that fine-line of responsibilities (Kafka monitoring would probably be able to tell you that partitions are not balanced). My 2 cents, cheers! On Tue, Feb 14, 2017 at 8:46 PM Raghu Angadiwrote: > On Tue, Feb 14, 2017 at 9:21 AM, Ben Chambers > > wrote: > > > > > > * I also think there are data source specific metrics that a given IO > > will > > > want to expose (ie, things like kafka backlog for a topic.) > > > UnboundedSource has API for backlog. It is better for beam/runners to > handle backlog as well. > Of course there will be some source specific metrics too (errors, i/o ops > etc). >
Re: Metrics for Beam IOs.
On Tue, Feb 14, 2017 at 9:21 AM, Ben Chamberswrote: > > > * I also think there are data source specific metrics that a given IO > will > > want to expose (ie, things like kafka backlog for a topic.) UnboundedSource has API for backlog. It is better for beam/runners to handle backlog as well. Of course there will be some source specific metrics too (errors, i/o ops etc).
Re: Metrics for Beam IOs.
>Many of the metrics that should be exposed for all transforms are likely best exposed by the runner or some other common layer, rather than being added to each transform. +1 I wasn't trying to advocate for each user trying to implement these on every transform - it should be provided by beam or the runner automatically. > But, the existing Metrics API should work within a source or sink -- anything that is called within a step should work. Great! I wasn't aware we'd changed that with the new Metrics API - I was only aware of the limitations of the old system. > if the source can detect that it is having trouble splitting and raise a message like "you're using compressed text files which can't be parallelized beyond the number of files" that is much more actionable. I think there's an important difference between what a particular runner chooses to show it's users in their monitoring interface vs what beam should be reporting. I think it's important that Beam or the runner layer should be reporting this data (which would be necessary for doing high level analysis like what you propose), but then the monitoring interface should choose how to expose that information. So then the question becomes - does it make sense for these common transform metrics to be exposed by runner implementations or within common beam code? S On Tue, Feb 14, 2017 at 9:21 AM Ben Chamberswrote: > On Tue, Feb 14, 2017 at 9:07 AM Stephen Sisk > wrote: > > > hi! > > > > (ben just sent his mail and he covered some similar topics to me, but > I'll > > keep my comments intact since they are slightly different) > > > > * I think there are a lot of metrics that should be exposed for all > > transforms - everything from JB's list (mile number of split, throughput, > > reading/writing rate, number of splits, etc..) also apply to > > splittableDoFns. > > > > Many of the metrics that should be exposed for all transforms are likely > best exposed by the runner or some other common layer, rather than being > added to each transform. But things like number of elements, estimated size > of elements, etc. all make sense for every transform. > > > > * I also think there are data source specific metrics that a given IO > will > > want to expose (ie, things like kafka backlog for a topic.) No one on > this > > thread has specifically addressed this, but Beam Sources & Sinks do not > > presently have the ability to report metrics even if a given IO writer > > wanted to - depending on the timeline for SplittableDoFn and the move to > > that infrastructure, I don't think we need that support in Sources/Sinks, > > but I do think we should make sure SplittableDoFn has the necessary > > support. > > > > Two parts -- we may want to introduce something like a Gauge here that lets > the metric system ask the source/sink for the latest metrics. This allows > the runner to gather metrics at a rate that makes sense without impacting > performance. > > But, the existing Metrics API should work within a source or sink -- > anything that is called within a step should work. > > > > * I think there are ways to do many metrics such that they are not too > > expensive to calculate all the time. (ie, reporting per bundle rather > than > > per item) I think we should ask whether we want/need are metrics that are > > expensive to calculate before going to the effort of adding > enable/disable. > > > > +1 -- hence why I'd like to look at reporting the metrics with no > configuration. > > > > * I disagree with ben about showing the amount of splitting - I think > > especially with IOs it's useful to understand/diagnose reading problems > > since that's one potential source of problems, especially given that the > > user can write transforms that split now in SplittableDoFn. But I look > > forward to discussing that further > > > > I think many of the splitting metrics fall into things the runner should > report. I think if we pick the right so they're useful, it likely doesn't > hurt to gather them, but here again it may be useful to talk about specific > problems. > > I still think these likely won't make sense for all users -- if I'm a new > user just trying to get a source/sink working, I'm not sure what "splitting > metrics" would be useful to me. But if the source can detect that it is > having trouble splitting and raise a message like "you're using compressed > text files which can't be parallelized beyond the number of files" that is > much more actionable. > > > > +1 on talking about specific examples > > > > S > > > > On Tue, Feb 14, 2017 at 8:29 AM Jean-Baptiste Onofré > > wrote: > > > > > Hi Aviem > > > > > > Agree with your comments, it's pretty close to my previous ones. > > > > > > Regards > > > JB > > > > > > On Feb 14, 2017, 12:04, at 12:04, Aviem Zur > wrote: > > > >Hi Ismaël, > > > > > > > >You've raised some great points. > > > >Please see my
Re: Metrics for Beam IOs.
On Tue, Feb 14, 2017 at 9:07 AM Stephen Siskwrote: > hi! > > (ben just sent his mail and he covered some similar topics to me, but I'll > keep my comments intact since they are slightly different) > > * I think there are a lot of metrics that should be exposed for all > transforms - everything from JB's list (mile number of split, throughput, > reading/writing rate, number of splits, etc..) also apply to > splittableDoFns. > Many of the metrics that should be exposed for all transforms are likely best exposed by the runner or some other common layer, rather than being added to each transform. But things like number of elements, estimated size of elements, etc. all make sense for every transform. > * I also think there are data source specific metrics that a given IO will > want to expose (ie, things like kafka backlog for a topic.) No one on this > thread has specifically addressed this, but Beam Sources & Sinks do not > presently have the ability to report metrics even if a given IO writer > wanted to - depending on the timeline for SplittableDoFn and the move to > that infrastructure, I don't think we need that support in Sources/Sinks, > but I do think we should make sure SplittableDoFn has the necessary > support. > Two parts -- we may want to introduce something like a Gauge here that lets the metric system ask the source/sink for the latest metrics. This allows the runner to gather metrics at a rate that makes sense without impacting performance. But, the existing Metrics API should work within a source or sink -- anything that is called within a step should work. > * I think there are ways to do many metrics such that they are not too > expensive to calculate all the time. (ie, reporting per bundle rather than > per item) I think we should ask whether we want/need are metrics that are > expensive to calculate before going to the effort of adding enable/disable. > +1 -- hence why I'd like to look at reporting the metrics with no configuration. > * I disagree with ben about showing the amount of splitting - I think > especially with IOs it's useful to understand/diagnose reading problems > since that's one potential source of problems, especially given that the > user can write transforms that split now in SplittableDoFn. But I look > forward to discussing that further > I think many of the splitting metrics fall into things the runner should report. I think if we pick the right so they're useful, it likely doesn't hurt to gather them, but here again it may be useful to talk about specific problems. I still think these likely won't make sense for all users -- if I'm a new user just trying to get a source/sink working, I'm not sure what "splitting metrics" would be useful to me. But if the source can detect that it is having trouble splitting and raise a message like "you're using compressed text files which can't be parallelized beyond the number of files" that is much more actionable. > +1 on talking about specific examples > > S > > On Tue, Feb 14, 2017 at 8:29 AM Jean-Baptiste Onofré > wrote: > > > Hi Aviem > > > > Agree with your comments, it's pretty close to my previous ones. > > > > Regards > > JB > > > > On Feb 14, 2017, 12:04, at 12:04, Aviem Zur wrote: > > >Hi Ismaël, > > > > > >You've raised some great points. > > >Please see my comments inline. > > > > > >On Tue, Feb 14, 2017 at 3:37 PM Ismaël Mejía wrote: > > > > > >> Hello, > > >> > > >> The new metrics API allows us to integrate some basic metrics into > > >the Beam > > >> IOs. I have been following some discussions about this on JIRAs/PRs, > > >and I > > >> think it is important to discuss the subject here so we can have more > > >> awareness and obtain ideas from the community. > > >> > > >> First I want to thank Ben for his work on the metrics API, and Aviem > > >for > > >> his ongoing work on metrics for IOs, e.g. KafkaIO) that made me aware > > >of > > >> this subject. > > >> > > >> There are some basic ideas to discuss e.g. > > >> > > >> - What are the responsibilities of Beam IOs in terms of Metrics > > >> (considering the fact that the actual IOs, server + client, usually > > >provide > > >> their own)? > > >> > > > > > >While it is true that many IOs provide their own metrics, I think that > > >Beam > > >should expose IO metrics because: > > > > > >1. Metrics which help understanding performance of a pipeline which > > >uses > > > an IO may not be covered by the IO . > > >2. Users may not be able to setup integrations with the IO's metrics to > > >view them effectively (And correlate them to a specific Beam pipeline), > > >but > > > still want to investigate their pipeline's performance. > > > > > > > > >> - What metrics are relevant to the pipeline (or some particular IOs)? > > >Kafka > > >> backlog for one could point that a pipeline is behind ingestion rate. > > > > > > > > >I think it depends on the IO, but there is
Re: Metrics for Beam IOs.
Thanks for starting this conversation Ismael! I too have been thinking we'll need some general approach to metrics for IO in the near future. Two general thoughts: 1. Before making the metrics configurable, I think it would be worthwhile to see if we can find the right set of metrics that provide useful information about IO without affecting performance and have these always on. Monitoring information like this is often useful when a pipeline is behaving unexpectedly, and predicting when that will happen and turning on the metrics is problematic. 2. I think focusing on metrics about source splitting and such is the wrong level from a user perspective. A user shouldn't need to understand how sources split and what that means. Instead, we should report higher-level metrics such as how many bytes of input have been processed, how many bytes remain (if that is known), etc. Ideally, metrics about splitting can be reported by the runner in a general manner. If they're useful for developing the source maybe that would be the configuration (indicating that you're developing a source and want these more detailed metrics). Maybe it would help to pick one or two IOs that you're looking at and talk about proposed metrics? That might focus the discussion on what metrics make sense to users and how expensive they might be to report? On Tue, Feb 14, 2017 at 8:29 AM Jean-Baptiste Onofréwrote: > Hi Aviem > > Agree with your comments, it's pretty close to my previous ones. > > Regards > JB > > On Feb 14, 2017, 12:04, at 12:04, Aviem Zur wrote: > >Hi Ismaël, > > > >You've raised some great points. > >Please see my comments inline. > > > >On Tue, Feb 14, 2017 at 3:37 PM Ismaël Mejía wrote: > > > >> Hello, > >> > >> The new metrics API allows us to integrate some basic metrics into > >the Beam > >> IOs. I have been following some discussions about this on JIRAs/PRs, > >and I > >> think it is important to discuss the subject here so we can have more > >> awareness and obtain ideas from the community. > >> > >> First I want to thank Ben for his work on the metrics API, and Aviem > >for > >> his ongoing work on metrics for IOs, e.g. KafkaIO) that made me aware > >of > >> this subject. > >> > >> There are some basic ideas to discuss e.g. > >> > >> - What are the responsibilities of Beam IOs in terms of Metrics > >> (considering the fact that the actual IOs, server + client, usually > >provide > >> their own)? > >> > > > >While it is true that many IOs provide their own metrics, I think that > >Beam > >should expose IO metrics because: > > > >1. Metrics which help understanding performance of a pipeline which > >uses > > an IO may not be covered by the IO . > >2. Users may not be able to setup integrations with the IO's metrics to > >view them effectively (And correlate them to a specific Beam pipeline), > >but > > still want to investigate their pipeline's performance. > > > > > >> - What metrics are relevant to the pipeline (or some particular IOs)? > >Kafka > >> backlog for one could point that a pipeline is behind ingestion rate. > > > > > >I think it depends on the IO, but there is probably overlap in some of > >the > >metrics so a guideline might be written for this. > >I listed what I thought should be reported for KafkaIO in the following > >JIRA: https://issues.apache.org/jira/browse/BEAM-1398 > >Feel free to add more metrics you think are important to report. > > > > > >> > >> > >- Should metrics be calculated on IOs by default or no? > >> - If metrics are defined by default does it make sense to allow users > >to > >> disable them? > >> > > > >IIUC, your concern is that metrics will add overhead to the pipeline, > >and > >pipelines which are highly sensitive to this will be hampered? > >In any case I think that yes, metrics calculation should be > >configurable > >(Enable/disable). > >In Spark runner, for example the Metrics sink feature (not the metrics > >calculation itself, but sinks to send them to) is configurable in the > >pipeline options. > > > > > >> Well these are just some questions around the subject so we can > >create a > >> common set of practices to include metrics in the IOs and eventually > >> improve the transform guide with this. What do you think about this? > >Do you > >> have other questions/ideas? > >> > >> Thanks, > >> Ismaël > >> >
Re: Metrics for Beam IOs.
Hi Aviem Agree with your comments, it's pretty close to my previous ones. Regards JB On Feb 14, 2017, 12:04, at 12:04, Aviem Zurwrote: >Hi Ismaël, > >You've raised some great points. >Please see my comments inline. > >On Tue, Feb 14, 2017 at 3:37 PM Ismaël Mejía wrote: > >> Hello, >> >> The new metrics API allows us to integrate some basic metrics into >the Beam >> IOs. I have been following some discussions about this on JIRAs/PRs, >and I >> think it is important to discuss the subject here so we can have more >> awareness and obtain ideas from the community. >> >> First I want to thank Ben for his work on the metrics API, and Aviem >for >> his ongoing work on metrics for IOs, e.g. KafkaIO) that made me aware >of >> this subject. >> >> There are some basic ideas to discuss e.g. >> >> - What are the responsibilities of Beam IOs in terms of Metrics >> (considering the fact that the actual IOs, server + client, usually >provide >> their own)? >> > >While it is true that many IOs provide their own metrics, I think that >Beam >should expose IO metrics because: > >1. Metrics which help understanding performance of a pipeline which >uses > an IO may not be covered by the IO . >2. Users may not be able to setup integrations with the IO's metrics to >view them effectively (And correlate them to a specific Beam pipeline), >but > still want to investigate their pipeline's performance. > > >> - What metrics are relevant to the pipeline (or some particular IOs)? >Kafka >> backlog for one could point that a pipeline is behind ingestion rate. > > >I think it depends on the IO, but there is probably overlap in some of >the >metrics so a guideline might be written for this. >I listed what I thought should be reported for KafkaIO in the following >JIRA: https://issues.apache.org/jira/browse/BEAM-1398 >Feel free to add more metrics you think are important to report. > > >> >> >- Should metrics be calculated on IOs by default or no? >> - If metrics are defined by default does it make sense to allow users >to >> disable them? >> > >IIUC, your concern is that metrics will add overhead to the pipeline, >and >pipelines which are highly sensitive to this will be hampered? >In any case I think that yes, metrics calculation should be >configurable >(Enable/disable). >In Spark runner, for example the Metrics sink feature (not the metrics >calculation itself, but sinks to send them to) is configurable in the >pipeline options. > > >> Well these are just some questions around the subject so we can >create a >> common set of practices to include metrics in the IOs and eventually >> improve the transform guide with this. What do you think about this? >Do you >> have other questions/ideas? >> >> Thanks, >> Ismaël >>
Re: Metrics for Beam IOs.
Hi Ismaël, You've raised some great points. Please see my comments inline. On Tue, Feb 14, 2017 at 3:37 PM Ismaël Mejíawrote: > Hello, > > The new metrics API allows us to integrate some basic metrics into the Beam > IOs. I have been following some discussions about this on JIRAs/PRs, and I > think it is important to discuss the subject here so we can have more > awareness and obtain ideas from the community. > > First I want to thank Ben for his work on the metrics API, and Aviem for > his ongoing work on metrics for IOs, e.g. KafkaIO) that made me aware of > this subject. > > There are some basic ideas to discuss e.g. > > - What are the responsibilities of Beam IOs in terms of Metrics > (considering the fact that the actual IOs, server + client, usually provide > their own)? > While it is true that many IOs provide their own metrics, I think that Beam should expose IO metrics because: 1. Metrics which help understanding performance of a pipeline which uses an IO may not be covered by the IO . 2. Users may not be able to setup integrations with the IO's metrics to view them effectively (And correlate them to a specific Beam pipeline), but still want to investigate their pipeline's performance. > - What metrics are relevant to the pipeline (or some particular IOs)? Kafka > backlog for one could point that a pipeline is behind ingestion rate. I think it depends on the IO, but there is probably overlap in some of the metrics so a guideline might be written for this. I listed what I thought should be reported for KafkaIO in the following JIRA: https://issues.apache.org/jira/browse/BEAM-1398 Feel free to add more metrics you think are important to report. > > - Should metrics be calculated on IOs by default or no? > - If metrics are defined by default does it make sense to allow users to > disable them? > IIUC, your concern is that metrics will add overhead to the pipeline, and pipelines which are highly sensitive to this will be hampered? In any case I think that yes, metrics calculation should be configurable (Enable/disable). In Spark runner, for example the Metrics sink feature (not the metrics calculation itself, but sinks to send them to) is configurable in the pipeline options. > Well these are just some questions around the subject so we can create a > common set of practices to include metrics in the IOs and eventually > improve the transform guide with this. What do you think about this? Do you > have other questions/ideas? > > Thanks, > Ismaël >
Re: Metrics for Beam IOs.
Hi Ismael Good point to discuss the metric here. Imho, the IOs should use the metric API to provide specific IO metrics (mile number of split, throughput, reading/writing rate, number of splits, etc). In Camel, each processor (aka IO) provides such metric indicators. The purpose is to provide the metric specific to the IO process (not to the back end). On the other hand, equivalent metrics can be found at the pipeline level and at IO level (it's a question of scope). I would propose to extend the documentation and maybe the API about the IO. Regards JB On Feb 14, 2017, 09:37, at 09:37, "Ismaël Mejía"wrote: >Hello, > >The new metrics API allows us to integrate some basic metrics into the >Beam >IOs. I have been following some discussions about this on JIRAs/PRs, >and I >think it is important to discuss the subject here so we can have more >awareness and obtain ideas from the community. > >First I want to thank Ben for his work on the metrics API, and Aviem >for >his ongoing work on metrics for IOs, e.g. KafkaIO) that made me aware >of >this subject. > >There are some basic ideas to discuss e.g. > >- What are the responsibilities of Beam IOs in terms of Metrics >(considering the fact that the actual IOs, server + client, usually >provide >their own)? > >- What metrics are relevant to the pipeline (or some particular IOs)? >Kafka >backlog for one could point that a pipeline is behind ingestion rate. > >- Should metrics be calculated on IOs by default or no? > >- If metrics are defined by default does it make sense to allow users >to >disable them? > >Well these are just some questions around the subject so we can create >a >common set of practices to include metrics in the IOs and eventually >improve the transform guide with this. What do you think about this? Do >you >have other questions/ideas? > >Thanks, >Ismaël