Re: [DISCUSSION] Runner agnostic metrics extractor?

Etienne Chauchot Wed, 29 Nov 2017 01:31:15 -0800

Thanks Ben for your comments!

Indeed, there is an issue about failover regarding the polling thread.To that extent, pushing metrics to a sink would be better. To make thispush runner agnostic, doing the code in the runner-common part of beamwould be good. Maybe in the runner API like JB suggests.IMHO It would be good to change the POC JB and I have started to thatdirection.


Question is: what is the proper extension point to add the push?
Somewhere like org.apache.beam.runners.core.metrics.MetricsContainerImpl ?

Best
Etienne

Le 27/11/2017 à 18:26, Jean-Baptiste Onofré a écrit :

Yeah, I think that something in the runner makes sense.
The only drawback is that it would require some enforcement on therunners and change on all runners.
If it could be part of the Runner API, that would help I think.
The idea of the thread poller was more in PoC way. Only thepolling/sending should be part of the runner. Marshaller and Sinkshould stay as interface and implemented. It would give us moreflexibility.
Regards
JB

On 11/27/2017 06:17 PM, Ben Chambers wrote:
I think discussing a runner agnostic way of configuring how metrics are
extracted is a great idea -- thanks for bringing it up Etienne!

Using a thread that polls the pipeline result relies on the program that
created and submitted the pipeline continuing to run (eg., no machine
faults, network problems, etc.). For many applications, this isn't agood
model (a Streaming pipeline may run for weeks, a Batch pipeline may be
automatically run every hour, etc.).

Etienne's proposal of having something the runner pushes metrics too has
the benefit of running in the same cluster as the pipeline, thushaving the
same reliability benefits.
As noted, it would require runners to ensure that metrics were pushedintothe extractor but from there it would allow a general configurationof how
metrics are extracted from the pipeline and exposed to some external
services.
Providing something that the runners could push metrics into and havethem
automatically exported seems like it would have several benefits:
1. It would provide a single way to configure how metrics areactually
exported.
   2. It would allow the runners to ensure it was reliably executed.
3. It would allow the runner to report system metrics directly(eg., if a
runner wanted to report the watermark, it could push that in directly).

-- Ben

On Mon, Nov 27, 2017 at 9:06 AM Jean-Baptiste Onofré <[email protected]>
wrote:
Hi all,

Etienne forgot to mention that we started a PoC about that.
What I started is to wrap the Pipeline creation to include a threadthat
polls
periodically the metrics in the pipeline result (it's what Iproposed when
I
compared with Karaf Decanter some time ago).
Then, this thread marshalls the collected metrics and send to asink. At
the
end, it means that the harvested metrics data will be store in abackend
(for
instance elasticsearch).

The pro of this approach is that it doesn't require any change in the
core, it's
up to the user to use the PipelineWithMetric wrapper.
The cons is that the user needs to explicitly use thePipelineWithMetric
wrapper.

IMHO, it's good enough as user can decide to poll metrics for some
pipelines and
not for others.

Regards
JB

On 11/27/2017 04:56 PM, Etienne Chauchot wrote:
Hi all,

I came by this ticket https://issues.apache.org/jira/browse/BEAM-2456.
I know
that the metrics subject has already been discussed a lot, but I would
like to
revive the discussion.
The aim in this ticket is to avoid relying on the runner to providethe
metrics
because they don't have all the same capabilities towards metrics. The
idea in
the ticket is to still use beam metrics API (and not others like
codahale as it
has been discussed some time ago) and provide a way to extract the
metrics with
a polling thread that would be forked by a PipelineWithMetrics (so,
almost
invisible to the end user) and then to push to a sink (such as a Http
rest sink
for example or Graphite sink or anything else...). Nevertheless, a
polling
thread might not work for all the runners because some might notmake themetrics available before the end of the pipeline. Also, forking athread
would
be a bit unconventional, so it could be provided as a beam sdkextension.
Another way, to avoid polling, would be to push metrics values to asink
when
they are updated but I don't know if it is feasible in a runner
independent way.
WDYT about the ideas in this ticket?

Best,
Etienne
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: [DISCUSSION] Runner agnostic metrics extractor?

Reply via email to