As a whole I feel like this FLIP is overly complicated. A dedicated coordinator MG implementation is overkill; it could just re-use the existing Task/OperatorMGs to create the same structure we have on TMs, similar to what we did with the Job MG.

However, I'm not convinced that this is required anyway, because all the example metrics you listed can be implemented on the TM side + aggregating them in the external metrics backend.

Since I'm on holidays soon, just so no one tries to pull a fast one on me, if this were to go to a vote as-is I'd be against it.


On 09/12/2022 15:30, Dong Lin wrote:
Hi Hang,

Thanks for the FLIP! The FLIP looks good and it is pretty informative.

I have just two minor comments regarding names:
- Would it be useful to rename the config key as
*metrics.scope.jm.job.operator-coordinator* for consistency with
*metrics.scope.jm.job
*(which is not named as *jm-job)?
- Maybe rename the variable as SCOPE_NAMING_OPERATOR_COORDINATOR for
simplicity and consistency with SCOPE_NAMING_OPERATOR (which is not named
as SCOPE_NAMING_TM_JOB_OPERATOR)?

Cheers,
Dong



On Thu, Dec 8, 2022 at 3:28 PM Hang Ruan <ruanhang1...@gmail.com> wrote:

Hi all,

MengYue and I created FLIP-274[1] Introduce metric group for
OperatorCoordinator. OperatorCoordinator is the coordinator for runtime
operators and running on Job Manager. The coordination mechanism is
operator events between OperatorCoordinator and its all operators, the
coordination is more and more using in Flink, for example many Sources and
Sinks depend on the mechanism to assign splits and coordinate commits to
external systems. The OperatorCoordinator is widely using in flink kafka
connector, flink pulsar connector, flink cdc connector, flink hudi
connector and so on.

But there is not a suitable metric group scope for the OperatorCoordinator
and not an implementation for the interface OperatorCoordinatorMetricGroup.
These metrics in OperatorCoordinator could be how many splits/partitions
have been assigned to source readers, how many files have been written out
by sink writers, these metrics not only help users to know the job progress
but also make big job maintaining easier. Thus we propose the FLIP-274 to
introduce a new metric group scope for OperatorCoordinator and provide an
internal implementation for OperatorCoordinatorMetricGroup.

Could you help review this FLIP when you get time? Any feedback is
appreciated!

Best,
Hang

[1]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-274%3A+Introduce+metric+group+for+OperatorCoordinator


Reply via email to