Re: [DISCUSS] FLIP-274 : Introduce metric group for OperatorCoordinator

Hang Ruan Fri, 09 Dec 2022 21:05:56 -0800

Hi, Chesnay,

Thanks for your reply.


Actually we can not reuse the Task/OperatorMG for the OperatorCoordinator.
There are mainly two reasons.
First of all, the scopes of these metric groups are not suitable for the
OperatorCoordinator. It should be
"<host>.jobmanager.<job_name>.<operator_name>.coordinator".
Secondly, there are some metrics that we cannot compute from the subtasks.
For example, in flink cdc connectors, we could report how many tables are
pending in the OperatorCoordinator. But this information is not available
in its subtasks.

We try to add some common metrics to the OperatorCoordinatorMetricGroup for
all OperatorCoordinator implementations. In fact, it should be discussed
whether these common metrics are necessary.

Best,
Hang

Chesnay Schepler <[email protected]> 于2022年12月10日周六 01:33写道：

> As a whole I feel like this FLIP is overly complicated. A dedicated
> coordinator MG implementation is overkill; it could just re-use the
> existing Task/OperatorMGs to create the same structure we have on TMs,
> similar to what we did with the Job MG.
>
> However, I'm not convinced that this is required anyway, because all the
> example metrics you listed can be implemented on the TM side +
> aggregating them in the external metrics backend.
>
> Since I'm on holidays soon, just so no one tries to pull a fast one on
> me, if this were to go to a vote as-is I'd be against it.
>
>
> On 09/12/2022 15:30, Dong Lin wrote:
> > Hi Hang,
> >
> > Thanks for the FLIP! The FLIP looks good and it is pretty informative.
> >
> > I have just two minor comments regarding names:
> > - Would it be useful to rename the config key as
> > *metrics.scope.jm.job.operator-coordinator* for consistency with
> > *metrics.scope.jm.job
> > *(which is not named as *jm-job)?
> > - Maybe rename the variable as SCOPE_NAMING_OPERATOR_COORDINATOR for
> > simplicity and consistency with SCOPE_NAMING_OPERATOR (which is not named
> > as SCOPE_NAMING_TM_JOB_OPERATOR)?
> >
> > Cheers,
> > Dong
> >
> >
> >
> > On Thu, Dec 8, 2022 at 3:28 PM Hang Ruan <[email protected]> wrote:
> >
> >> Hi all,
> >>
> >> MengYue and I created FLIP-274[1] Introduce metric group for
> >> OperatorCoordinator. OperatorCoordinator is the coordinator for runtime
> >> operators and running on Job Manager. The coordination mechanism is
> >> operator events between OperatorCoordinator and its all operators, the
> >> coordination is more and more using in Flink, for example many Sources
> and
> >> Sinks depend on the mechanism to assign splits and coordinate commits to
> >> external systems. The OperatorCoordinator is widely using in flink kafka
> >> connector, flink pulsar connector, flink cdc connector, flink hudi
> >> connector and so on.
> >>
> >> But there is not a suitable metric group scope for the
> OperatorCoordinator
> >> and not an implementation for the interface
> OperatorCoordinatorMetricGroup.
> >> These metrics in OperatorCoordinator could be how many splits/partitions
> >> have been assigned to source readers, how many files have been written
> out
> >> by sink writers, these metrics not only help users to know the job
> progress
> >> but also make big job maintaining easier. Thus we propose the FLIP-274
> to
> >> introduce a new metric group scope for OperatorCoordinator and provide
> an
> >> internal implementation for OperatorCoordinatorMetricGroup.
> >>
> >> Could you help review this FLIP when you get time? Any feedback is
> >> appreciated!
> >>
> >> Best,
> >> Hang
> >>
> >> [1]
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-274%3A+Introduce+metric+group+for+OperatorCoordinator
> >>
>
>

Re: [DISCUSS] FLIP-274 : Introduce metric group for OperatorCoordinator

Reply via email to