Thanks Robert for the details,
I did not know that the runner harness periodically asked the SDK harness for
updates. I thought it was only
communicating at the beginning of the bundle and at the end. Something like the
simplified sequence diagram bellow
but if the metrics are not a regular diff but more the not-yet-committed dirty
value, that means that the runner sends
the metrics value to the sdk before the bundle is started processing.So the
sequence diagram becomes something more
like:
WDYT ?
Etienne
Le mardi 11 septembre 2018 à 17:53 +0200, Robert Bradshaw a écrit :
> On Mon, Sep 10, 2018 at 11:07 AM Etienne Chauchot <[email protected]>
> wrote:
> > Hi all,
> > @Luke, @Alex I have a general question related to metrics in the Fn API: as
> > the communication between runner harness
> > and SDK harness is done on a bundle basis. When the runner harness sends
> > data to the sdk harness to execute a
> > transform that contains metrics, does it:
> > send metrics values (for the ones defined in the transform) alongside with
> > data and receive an updated value of the
> > metrics from the sdk harness when the bundle is finished processing?or does
> > it send only the data and the sdk
> > harness responds with a diff value of the metrics so that the runner can
> > update them in its side?My bet is option 2.
> > But can you confirm?
>
> The runner harness periodically asks for the status of a bundle to which the
> runner harness may respond with a current
> snapshot of metrics. These metrics are deltas in the sense that only "dirty"
> metrics need to be reported (i.e.
> unreported metrics can be assumed to have their previous values) but are
> *not* deltas with respect to values, i.e. the
> full value is reported each time. As an example, suppose one were counting
> red and blue marbles. The first update may
> be something like
> { red: 5, blue: 7}
> and if two more blue ones were found, a valid update would be
> { blue: 9 }
> On bundle completion, the full set of metrics is reported as part of the same
> message that declares the bundle
> complete.
>
>
> On Tue, Sep 11, 2018 at 11:43 AM Etienne Chauchot <[email protected]>
> wrote:
> > Le lundi 10 septembre 2018 à 09:42 -0700, Lukasz Cwik a écrit :
> > > Alex is out on vacation for the next 3 weeks.
> > > Alex had proposed the types of metrics[1] but not the exact protocol as
> > > to what the SDK and runner do. I could
> > > envision Alex proposing that the SDK harness only sends diffs or dirty
> > > metrics in intermediate updates and all
> > > metrics values in the final update.
> > > Robert is referring to an integration that happened to an older set of
> > > messages[2] that preceeded Alex's proposal
> > > and that integration with Dataflow which is still incomplete works as you
> > > described in #2.
> >
> > Thanks Luke and Robert for the confirmation.
> > > Robin had recently been considering adding an accessor to DoFns that
> > > would allow you to get access to the job
> > > information from within the pipeline (current state, poll for metrics,
> > > invoke actions like cancel / drain, ...).
> > > He wanted it so he could poll for attempted metrics to be able to test
> > > @RequiresStableInput.
> > Yes, I remember, I voted +1 to his proposal.
> >
> > > Integrating the MetricsPusher or something like that on the SDK side to
> > > be able to poll metrics over the job
> > > information accessor could be useful.
> >
> > Well, in the design discussion, we decided to host Metrics Pusher as close
> > as possible of the actual engine (inside
> > the runner code chosen over the sdk code) to allow the runner to send
> > system metrics in the future.
>
> +1. The runner harness can then do whatever it wants (e.g. reporting back to
> its master, or pushing to another
> service, or simply dropping them), but the SDKs only have to follow the FnAPI
> contract.
>
> > > 1: https://s.apache.org/beam-fn-api-metrics
> > > 2:
> > > https://github.com/apache/beam/blob/9b68f926628d727e917b6a33ccdafcfe693eef6a/model/fn-execution/src/main/proto/
> > > beam_fn_api.proto#L410
> >
> > Besides, in his PR Alex talks about deprecated metrics. As he is off, can
> > you tell me a little more about them ?
> > What metrics will be deprecated when the portability framework is 100%
> > operational on all the runners?
>
> Currently, the SDKs return metrics to the FnAPI via the proto found at
> https://github.com/apache/beam/blob/release-2.6
> .0/model/fn-execution/src/main/proto/beam_fn_api.proto#L410 (and specifically
> user metrics at https://github.com/apach
> e/beam/blob/release-2.6.0/model/fn-execution/src/main/proto/beam_fn_api.proto#L483
> ) The new metrics are the nested
> one-ofs defined at
> https://github.com/apache/beam/blob/release-2.6.0/model/fn-execution/src/main/proto/beam_fn_api.pro
> to#L269