Re: [DISCUSS] FLIP-274 : Introduce metric group for OperatorCoordinator

Hang Ruan Tue, 31 Jan 2023 22:12:18 -0800

Hi all,

> IIUC, there is only an OperatorCoordinatorMetricGroup interface without
implementations (it never worked). There were no coordinator metrics
implemented and registered. @Hang Ruan please correct me if I'm wrong.
Yes,  we shouldn't have to worry about backwards-compatibility as Chesnay
said.


After the discussion, I think the FLIP should be modified as follow.
1.  Add new metric group JobManagerOperatorMetricGroup, whose scope format
is JobManagerOperatorScopeFormat. The config key of
JobManagerOperatorScopeFormat
is metrics.scope.jm-operator, and the default value is
<host>.jobmanager.<job_name>.<task_name>.<operator_name>.
2. JobManagerOperatorMetricGroup will be sub components of
JobManagerJobMetricGroup.
3. InternalOperatorCoordinatorMetricGroup will extend ProxyMetricGroup,
which registered under JobManagerOperatorMetricGroup by
addGroup("coordinator").

I have updated the FLIP-274. @Chesnay Schepler <[email protected]> @Jark Wu
<[email protected]> , please help to take a look at the modified FLIP and
make sure that I do not miss  something.

If there are no more problems, we plan to start the voting thread later.

Best,
Hang

Chesnay Schepler <[email protected]> 于2023年1月31日周二 20:09写道：

> > IIUC, there is only an OperatorCoordinatorMetricGroup interface without
> implementations (it never worked). There were no coordinator metrics
> implemented and registered. @Hang Ruan <[email protected]> please
> correct me if I'm wrong.
>
> Indeed. There are interfaces, and theoretically a metric group is exposed
> to user-code via the SplitEnumeratorContext, but the metric group is always
> null in the SourceCoordinatorContext. So we shouldn't have to worry about
> backwards-compatibility.
>
> > However, the config key "metrics.scope.jm.job" is deprecated and is
> replaced by "metrics.scope.jm-job". Similarly, "metrics.scope.tm.job" is
> replaced with "metrics.scope.tm-job". That's why I though
> "metrics.scope.jm-operator" might be more consistent.
>
> HA! Good catch. I only looked at the 1.16 docs where they weren't
> deprecated yet.
>
> Then it should be "jm-operator" indeed.
>
> On 31/01/2023 10:51, Jark Wu wrote:
>
> Hi Chesnay,
>
> > The new option must default to the configured value of
> metrics.scope.jm.job because otherwise it isn't backwards-compatible with
> existing coordinator metrics.
> IIUC, there is only an OperatorCoordinatorMetricGroup interface without
> implementations (it never worked). There were no coordinator metrics
> implemented and registered. @Hang Ruan <[email protected]> please
> correct me if I'm wrong.
>
> > Intuitively I think the config key should be metrics.scope.jm.operator.,
> similar to metrics.scope.jm.job.
> However, the config key "metrics.scope.jm.job" is deprecated and is
> replaced by "metrics.scope.jm-job". Similarly, "metrics.scope.tm.job" is
> replaced with "metrics.scope.tm-job". That's why I though
> "metrics.scope.jm-operator" might be more consistent.
>
> > We would only support operator_name, operator_id, task_name and task_id
> (+ everything else from the jm/job groups of course).
> +1
>
> Best,
> Jark
>
>
>
> On Mon, 30 Jan 2023 at 20:09, Chesnay Schepler <[email protected]> wrote:
>
>> The new option must default to the configured value of
>> metrics.scope.jm.job because otherwise it isn't backwards-compatible with
>> existing coordinator metrics.
>> According to some earlier response in this thread there are already
>> coordinator metrics.
>>
>> If that weren't the case you'd be correct, and we'd keep the default
>> consistent with the default for tm operator metrics.
>>
>> Intuitively I think the config key should be metrics.scope.jm.operator.,
>> similar to metrics.scope.jm.job.
>> For symmetry we should also add new metrics.scope.tm.operator/task and
>> eventually phase out metrics.scope.operator.
>>
>> > Regarding "coordinator metrics don't *belong to* any tasks/vertexes",
>> I mean the coordinator doesn't run on specific tasks, so it doesn't have
>> runtime information of tasks.
>>
>> That is true. I intentionally only referred to task id/name, as things
>> like subtask index or attempt ids don't make sense for coordinators as you
>> said.
>> We would only support operator_name, operator_id, task_name and task_id
>> (+ everything else from the jm/job groups of course).
>> (Maybe in the future attempt_num could be interesting though)
>>
>> On 30/01/2023 10:39, Jark Wu wrote:
>>
>> Hi Chesnay,
>>
>> IIUC, our only disagreement is whether to support task info in the metric
>> format. But we all agree with the default metric format
>> "metrics.scope.jm-operator=<host>.jobmanager.<job_name>.<operator_name>"
>> which doesn't include task info, right?
>>
>> Regarding "coordinator metrics don't *belong to* any tasks/vertexes", I
>> mean the coordinator doesn't run on specific tasks, so it doesn't have
>> runtime information of tasks. My concern about adding task info to the
>> coordinator metric is that
>> 1) if we register coordinator metrics for every subtask, the number of
>> coordinator metrics will be exploded when task parallelism is large. This
>> is an overhead to JM & metric systems, and the duplicated metrics are
>> useless for users.
>> 2) Correct me if I'm wrong, MetricGroup#counter only supports registering
>> one metric. It seems it can't register a metric reporting for multiple
>> metric names (e.g. multiple subtasks).
>>
>> Best,
>> Jark
>>
>>
>> On Mon, 30 Jan 2023 at 16:11, Chesnay Schepler <[email protected]>
>> wrote:
>>
>>> I hadn't looked in detail in to how the task info can be introduced, but
>>> given that the coordinates are created by the execution graph, where we
>>> only work on tasks, it should be possible and rather simple.
>>>
>>> The OperatorCoordinatorHolder gets access to the ExecutionJobVertex from
>>> which it can extract everything and create the task/operator groups.
>>> Exposure to the coordinator could happen via the coordinator context as
>>> originally planned.
>>>
>>>  > As we know, coordinator metrics are just like JMJobMetricGroup, which
>>> don't belong to any tasks/vertexes.
>>>
>>> I don't think this is true, every coordinator is scoped to a specific
>>> operator, and every operator is associated with a particular vertex.
>>>
>>>  From the javadocs:
>>>
>>> " A coordinator for runtime operators. The OperatorCoordinator runs on
>>> the master, associated with
>>>   the job vertex of the operator. It communicates with operators via
>>> sending operator events."
>>>
>>> On 28/01/2023 09:15, Jark Wu wrote:
>>> > Hi all,
>>> >
>>> > IIUC, Chesnay means we should have a more general metric group for
>>> > **operator**
>>> > not for **coordinator** in JM, this would be useful to extend
>>> > other operator-specific
>>> > metrics in the future. That means the new scope format should be
>>> designed
>>> > for
>>> > the operator,
>>> > e.g.,
>>> metrics.scope.jm-operator=<host>.jobmanager.<job_name>.<operator_name>
>>> > The coordinator metric is a subgroup (a constant "coordinator" suffix)
>>> of
>>> > the JMOperatorMG.
>>> >
>>> > I think this is a nice design. However, I have a question about adding
>>> task
>>> > id/name to this list.
>>> > How to get the task id/name when reporting a coordinator metric? As we
>>> > know, coordinator metrics
>>> > are just like JMJobMetricGroup, which don't belong to any
>>> tasks/vertexes.
>>> > Do you mean reporting
>>> > coordinator metrics for every task id under the operator?
>>> >
>>> > Best,
>>> > Jark
>>> >
>>> >
>>> > On Thu, 19 Jan 2023 at 17:33, Chesnay Schepler <[email protected]>
>>> wrote:
>>> >
>>> >>   > First, I do not understand why users have to configure the new
>>> scope
>>> >> format, which has a default value.
>>> >>
>>> >> If you don't use scope formats, sure. If you do use scope formats,
>>> e.g.
>>> >> to add a common prefix (which is the case for datadog users for
>>> >> example), then the current default in the FLIP is insufficient and
>>> >> requires the user to update the configuration.
>>> >>
>>> >>   > I usually do not need to change these configuration of scope
>>> formats
>>> >> when submitting the flink job.
>>> >>
>>> >> Scope formats as a whole are quite a power-user feature, but that
>>> >> doesn't mean we should ignore it.
>>> >>
>>> >>   > I try to let it extend the ComponentMetricGroup
>>> >>
>>> >> This isn't inherently required just because it is a "component".
>>> >> Component metric groups should _only_ be used for cases where such a
>>> >> component provides several bits of metadata.
>>> >> For example, tasks provide vertex ids, task names, attempted IDs etc.,
>>> >> and we'd like users to have the option on how this metadata ends up
>>> >> being used (via scope formats). This currently can't be built with the
>>> >> addGroup() methods, hence why the component groups exist.
>>> >> However, the operator coordinator _itself_does not provide several
>>> bits
>>> >> of metadata. Logically, the _operator_ does, not the coordinator.
>>> >>
>>> >>   > This make me decide to add a new scope format.
>>> >>
>>> >> And this is fine; just don't add a new scope format for the
>>> coordinator
>>> >> but logically for operators on the JM side, such that we can extend
>>> the
>>> >> set of operator-specific metrics in the future without having to add
>>> yet
>>> >> another scope format.
>>> >>
>>> >>   > The relationship between the Operator and OperatorCoordinator is
>>> >> maintained in the OperatorCoordinatorHolder. In the POC, the operator
>>> >> name/id could be found and the OperatorCoordinatorMetricGroup will be
>>> >> created here.
>>> >>
>>> >> That's all irrelevant.
>>> >>
>>> >>   > The registered metrics of OperatorCoordinator always are those
>>> which
>>> >> can not be aggregated from the tasks, like the number of unassigned
>>> >> splits in the SourceCoordinator. Actually we have not encountered the
>>> >> scenario you mentioned. I think the OperatorCoordinatorMetricGroup
>>> >> should only contain the metrics for itself instead of its subtasks.
>>> >>
>>> >> You are misunderstanding the problem.
>>> >>
>>> >> This isn't about aggregating metrics on our side or exposing metrics
>>> >> from TMs on the JM side or anything like that.
>>> >>
>>> >> It's purely about the available metadata for operator metrics on the
>>> JM
>>> >> side. Currently you suggest operator name / id; what I'm proposing is
>>> to
>>> >> add task(==vertex!) id / name to this list.
>>> >>
>>> >> The benefit here is simple. I can look up all metrics where
>>> task_id==XXX
>>> >> and get _everything_ related to that vertex, including all metrics
>>> >> associated with operators that are part of that vertex, including the
>>> >> coordinator.
>>> >>
>>> >>   > Using metrics.scope.jm.job is not enough to distinguish the
>>> different
>>> >> coordinators.
>>> >>
>>> >> I did not suggest just using metrics.scope.jm.job. I suggested that as
>>> >> the default for the jm operator scope format, that is used for the
>>> >> proposed JobManagerOperatorMetricGroup, which will have some plain
>>> >> metric groups as children for coordinators.
>>> >> Aka, you have the JMOperatorMG, then you call addGroup("coordinators")
>>> >> and then addGroup(coordinator_name) for each coordinator.
>>> >>
>>> >>   > So there are two choice for the
>>> >> InternalOperatorCoordinatorMetricGroup: 1. add and improve the new
>>> scope
>>> >> format; 2. use the metrics.scope.jm.job and ProxyMetricGroup. Which
>>> one
>>> >> is better?
>>> >>
>>> >> There are more options than that.
>>> >> I feel like there are a lot of misunderstandings in this discussion.
>>> >> Please let me know if I could clear things up. If not I can also
>>> provide
>>> >> a PoC based on your PoC if that can speed things up. It's not that
>>> >> different anyway.
>>> >>
>>> >> On 19/01/2023 07:39, Hang Ruan wrote:
>>> >>> Hi, chesnay,
>>> >>>
>>> >>> Thanks for your reply. I still have some doubts about the questions
>>> >>> you raised.
>>> >>>
>>> >>>      > Extending the set of ScopeFormats is problematic because it in
>>> >>>      practice
>>> >>>      it breaks the config if users actively rely on it, since
>>> there's now
>>> >>>      another key that they _must_ set for it to be
>>> >>>      consistent/compatible with
>>> >>>      their existing setup.
>>> >>>      Unfortunately due to how powerful scope formats are we can't
>>> derive a
>>> >>>      default value that matches their existing setup.
>>> >>>      Hence we should try to do this as rarely as possible.
>>> >>>
>>> >>>
>>> >>>      > This FLIP does not adhere to that since it proposes a
>>> dedicated
>>> >>>      format
>>> >>>      for coordinators; next time we want to expose operator-specific
>>> >>>      metrics
>>> >>>      (e.g., in the scheduler) we'd have to add another one to
>>> support it.
>>> >>>
>>> >>>
>>> >>> First, I do not understand why users have to configure the new scope
>>> >>> format, which has a default value. I usually do not need to change
>>> >>> these configuration of scope formats when submitting the flink job.
>>> >>> IMO, the OperatorCoordinator is a component running at the JobMaster.
>>> >>> I try to let it extend the ComponentMetricGroup, but I cannot find
>>> >>> a suitable scope. This make me decide to add a new scope format.
>>> >>>
>>> >>>      > Additionally, the configurable variables (operator name/id)
>>> are
>>> >>>      logically not attached to the coordinator, but operators, so to
>>> me it
>>> >>>      just doesn't make sense to structure it like this.
>>> >>>
>>> >>>
>>> >>> The relationship between the Operator and OperatorCoordinator is
>>> >>> maintained in the OperatorCoordinatorHolder. In the POC, the operator
>>> >>> name/id could be found and the OperatorCoordinatorMetricGroup will be
>>> >>> created here.
>>> >>>
>>> >>>      > Another thing I'm concerned about is that, because we don't
>>> include
>>> >>>      tasks in the hierarchy, users wishing to collect all metrics
>>> for a
>>> >>>      particular task (in this case ==vertex) now have to go
>>> >>>      significantly out
>>> >>>      of their way to get them, since they can no longer just filter
>>> by the
>>> >>>      task ID but have to be filter for _all_ operators that are part
>>> of
>>> >>>      the task.
>>> >>>
>>> >>>
>>> >>> The registered metrics of OperatorCoordinator always are those which
>>> >>> can not be aggregated from the tasks, like the number of unassigned
>>> >>> splits in the SourceCoordinator. Actually we have not encountered the
>>> >>> scenario you mentioned. I think the OperatorCoordinatorMetricGroup
>>> >>> should only contain the metrics for itself instead of its subtasks.
>>> >>>
>>> >>> Using metrics.scope.jm.job is not enough to distinguish the different
>>> >>> coordinators. The operator id/name is necessary. Then
>>> >>> the implementation should be like this. But users can not change the
>>> >>> scope in this way. This is also acceptable.
>>> >>> @Internal public class InternalOperatorCoordinatorMetricGroup
>>> >>>           extends ProxyMetricGroup<MetricGroup>
>>> >>>           implements OperatorCoordinatorMetricGroup {
>>> >>>       public InternalOperatorCoordinatorMetricGroup(
>>> >>>               JobManagerJobMetricGroup parent, OperatorID operatorID,
>>> >> String operatorName) {
>>> >>>           super(parent.addGroup(operatorID +
>>> >> operatorName).addGroup("coordinator")); }
>>> >>> }
>>> >>> So there are two choice for the
>>> >>> InternalOperatorCoordinatorMetricGroup: 1. add and improve the new
>>> >>> scope format; 2. use the metrics.scope.jm.job and ProxyMetricGroup.
>>> >>> Which one is better?
>>> >>>
>>> >>> Best,
>>> >>> Hang
>>> >>>
>>> >>>
>>> >>> Chesnay Schepler <[email protected]> 于2023年1月18日周三 17:03写道：
>>> >>>
>>> >>>      You're misunderstanding the problem.
>>> >>>
>>> >>>      Metric groups form a tree with each group providing certain
>>> metadata.
>>> >>>
>>> >>>      E.g., on the taskmanager we have a TM metric group that
>>> provides info
>>> >>>      about the TM, that has child task metric groups, that have child
>>> >>>      operator metric groups etc. The operator metric group again has
>>> >>>      children, where we are now mostly in the user/connector land,
>>> like a
>>> >>>      kafka metric group providing partition info or something.
>>> >>>
>>> >>>      What is being proposed is to create a coordinator metric group
>>> on
>>> >>>      the JM
>>> >>>      side that provides operator metadata.
>>> >>>
>>> >>>      A more appropriate structure is to create an operator group on
>>> the JM
>>> >>>      side that provides this info, with the coordinator metric group
>>> >>>      being a
>>> >>>      child.
>>> >>>
>>> >>>      On 17/01/2023 19:56, Steven Wu wrote:
>>> >>>      >> Additionally, the configurable variables (operator name/id)
>>> are
>>> >>>      > logically not attached to the coordinator, but operators, so
>>> to
>>> >>>      me it
>>> >>>      > just doesn't make sense to structure it like this.
>>> >>>      >
>>> >>>      > Chesnay, maybe we should clarify the terminology. To me,
>>> >>>      pperators (like
>>> >>>      > FLIP-27 source) can have two parts (coordinator and
>>> >>>      reader/subtask). I
>>> >>>      > think it is fine to include operator name/id for coordinator
>>> >>>      metrics.
>>> >>>      >
>>> >>>      > On Mon, Jan 16, 2023 at 2:13 AM Chesnay Schepler
>>> >>>      <[email protected]> wrote:
>>> >>>      >
>>> >>>      >> Slight correction: Using metrics.scope.jm.job as the default
>>> >>>      should be
>>> >>>      >> safe.
>>> >>>      >>
>>> >>>      >> On 16/01/2023 10:18, Chesnay Schepler wrote:
>>> >>>      >>> The proposed ScopeFormat is still problematic for a few
>>> reasons.
>>> >>>      >>>
>>> >>>      >>> Extending the set of ScopeFormats is problematic because it
>>> in
>>> >>>      >>> practice it breaks the config if users actively rely on it,
>>> since
>>> >>>      >>> there's now another key that they _must_ set for it to be
>>> >>>      >>> consistent/compatible with their existing setup.
>>> >>>      >>> Unfortunately due to how powerful scope formats are we can't
>>> >>>      derive a
>>> >>>      >>> default value that matches their existing setup.
>>> >>>      >>> Hence we should try to do this as rarely as possible.
>>> >>>      >>>
>>> >>>      >>> This FLIP does not adhere to that since it proposes a
>>> >>>      dedicated format
>>> >>>      >>> for coordinators; next time we want to expose
>>> operator-specific
>>> >>>      >>> metrics (e.g., in the scheduler) we'd have to add another
>>> one to
>>> >>>      >>> support it.
>>> >>>      >>>
>>> >>>      >>> Additionally, the configurable variables (operator name/id)
>>> are
>>> >>>      >>> logically not attached to the coordinator, but operators, so
>>> >>>      to me it
>>> >>>      >>> just doesn't make sense to structure it like this.
>>> >>>      >>>
>>> >>>      >>> Another thing I'm concerned about is that, because we don't
>>> >>>      include
>>> >>>      >>> tasks in the hierarchy, users wishing to collect all
>>> metrics for
>>> >> a
>>> >>>      >>> particular task (in this case ==vertex) now have to go
>>> >>>      significantly
>>> >>>      >>> out of their way to get them, since they can no longer just
>>> >>>      filter by
>>> >>>      >>> the task ID but have to be filter for _all_ operators that
>>> are
>>> >>>      part of
>>> >>>      >>> the task.
>>> >>>      >>>
>>> >>>      >>> On 16/01/2023 03:09, Hang Ruan wrote:
>>> >>>      >>>> Hi, @[email protected] <[email protected]> ,
>>> >>>      >>>>
>>> >>>      >>>> Do you have time to help to review this FLIP again after
>>> the
>>> >>>      >>>> modification?
>>> >>>      >>>> Looking forward to your reply.
>>> >>>      >>>> This FLIP will add a new configuration for the
>>> >>>      >>>> OperatorCoordinatorMetricGroup scope format. It provides an
>>> >>>      internal
>>> >>>      >>>> implementation and is added as a component to the
>>> >>>      >>>> JobManagerJobMetricGroup.
>>> >>>      >>>> If something doesn't make sense, could you provide some
>>> >>>      advice? It
>>> >>>      >>>> will be
>>> >>>      >>>> very helpful. Thanks a lot for your help.
>>> >>>      >>>>
>>> >>>      >>>> Best,
>>> >>>      >>>> Hang
>>> >>>      >>>>
>>> >>>      >>>> Martijn Visser <[email protected]> 于2023年1月11日周三
>>> >>>      16:34写道：
>>> >>>      >>>>
>>> >>>      >>>>
>>> >>>      >>>>> Hi Hang,
>>> >>>      >>>>>
>>> >>>      >>>>> I'm a bit surprised that this has gone to a vote, given
>>> that
>>> >>>      Chesnay
>>> >>>      >>>>> deliberately mentioned that he would vote against it
>>> as-is.
>>> >>>      I would
>>> >>>      >>>>> expect
>>> >>>      >>>>> that before going to a vote, he has had the opportunity to
>>> >>>      >>>>> participate in
>>> >>>      >>>>> this discussion.
>>> >>>      >>>>>
>>> >>>      >>>>> Best regards,
>>> >>>      >>>>>
>>> >>>      >>>>> Martijn
>>> >>>      >>>>>
>>> >>>      >>>>> On Tue, Jan 3, 2023 at 12:53 PM Jark Wu <[email protected]
>>> >
>>> >>>      wrote:
>>> >>>      >>>>>
>>> >>>      >>>>>> Hi Dong,
>>> >>>      >>>>>>
>>> >>>      >>>>>> Regarding “SplitEnumeratorContext#metricGroup”, my only
>>> >>>      concern is
>>> >>>      >>>>>> that
>>> >>>      >>>>>> this is a core interface for the FLIP. It’s hard to tell
>>> how
>>> >>>      >>>>>> sources use
>>> >>>      >>>>>> metric group without mentioning this interface. Even if
>>> >>>      this is an
>>> >>>      >>>>> existing
>>> >>>      >>>>>> API, I think it’s worth introducing the interface again
>>> and
>>> >>>      declaring
>>> >>>      >>>>> that
>>> >>>      >>>>>> we will implement the interface instead of a no-op method
>>> >>>      in this
>>> >>>      >>>>>> FLIP.
>>> >>>      >>>>>>
>>> >>>      >>>>>> Anyway, this is a minor problem and shouldn’t block this
>>> >>>      FLIP. I’m
>>> >>>      >>>>>> +1 to
>>> >>>      >>>>>> start a vote.
>>> >>>      >>>>>>
>>> >>>      >>>>>> Best,
>>> >>>      >>>>>> Jark
>>> >>>      >>>>>>
>>> >>>      >>>>>>
>>> >>>      >>>>>>> 2023年1月3日 10:03，Hang Ruan <[email protected]> 写道：
>>> >>>      >>>>>>>
>>> >>>      >>>>>>> Hi, Jark and Dong,
>>> >>>      >>>>>>>
>>> >>>      >>>>>>> Thanks for your comments. Sorry for my late reply.
>>> >>>      >>>>>>>
>>> >>>      >>>>>>> For suggestion 1, I plan to implement the
>>> >>>      >>>>>>> SplitEnumeratorMetricGroup in
>>> >>>      >>>>>>> another issue, and it is not contained in this FLIP. I
>>> >>>      will add some
>>> >>>      >>>>>>> description about this part.
>>> >>>      >>>>>>> For suggestion 2, changes about
>>> >>>      OperatorCoordinator#metricGroup has
>>> >>>      >>>>>> already
>>> >>>      >>>>>>> been documented in the proposed change section.
>>> >>>      >>>>>>>
>>> >>>      >>>>>>> Best,
>>> >>>      >>>>>>> Hang
>>> >>>      >>>>>>>
>>> >>>      >>>>>>> Dong Lin <[email protected]> 于2023年1月1日周日 09:45写道：
>>> >>>      >>>>>>>
>>> >>>      >>>>>>>> Let me chime-in and add comments regarding the public
>>> >>>      interface
>>> >>>      >>>>> section.
>>> >>>      >>>>>>>> Please see my comments inline.
>>> >>>      >>>>>>>>
>>> >>>      >>>>>>>> On Thu, Dec 29, 2022 at 6:08 PM Jark Wu
>>> >>>      <[email protected]> wrote:
>>> >>>      >>>>>>>>
>>> >>>      >>>>>>>>> Hi Hang,
>>> >>>      >>>>>>>>>
>>> >>>      >>>>>>>>> Thanks for driving this discussion. I think this is a
>>> >>>      very useful
>>> >>>      >>>>>> feature
>>> >>>      >>>>>>>>> for connectors.
>>> >>>      >>>>>>>>>
>>> >>>      >>>>>>>>> The FLIP looks quite good to me, and I just have two
>>> >>>      suggestions.
>>> >>>      >>>>>>>>>
>>> >>>      >>>>>>>>> 1. In the "Public Interface" section, mention that the
>>> >>>      >>>>>>>>> implementation
>>> >>>      >>>>>>>>> behavior of "SplitEnumeratorContext#metricGroup" is
>>> >>>      changed from
>>> >>>      >>>>>>>> returning
>>> >>>      >>>>>>>>> null to returning a concrete
>>> SplitEnumeratorMetricGroup
>>> >>>      instance.
>>> >>>      >>>>> Even
>>> >>>      >>>>>>>>> though the API is already there, the behavior change
>>> can
>>> >>>      also be
>>> >>>      >>>>>>>> considered
>>> >>>      >>>>>>>>> a public change.
>>> >>>      >>>>>>>>>
>>> >>>      >>>>>>>> SplitEnumeratorContext#metricGroup is an interface and
>>> >>>      this FLIP
>>> >>>      >>>>>>>> does
>>> >>>      >>>>>> not
>>> >>>      >>>>>>>> seem to change its semantics/behavior. The FLIP does
>>> >>>      change the
>>> >>>      >>>>>>>> implementation/behavior of
>>> >>>      SourceCoordinatorContext#metricGroup,
>>> >>>      >>>>>>>> which
>>> >>>      >>>>>> is
>>> >>>      >>>>>>>> marked @Internal.
>>> >>>      >>>>>>>>
>>> >>>      >>>>>>>> Thus it might seem a bit weird to add in the public
>>> >> interface
>>> >>>      >>>>>>>> section
>>> >>>      >>>>>>>> saying that we change the interface
>>> >>>      >>>>>>>> SplitEnumeratorContext#metricGroup
>>> >>>      >>>>>> from
>>> >>>      >>>>>>>> returning null to non-null object.
>>> >>>      >>>>>>>>
>>> >>>      >>>>>>>> 2. Mention the newly added interface of
>>> >>>      >>>>>> "OperatorCoordinator#metricGroup"
>>> >>>      >>>>>>>>> in the "Proposed Changes" section or "Public
>>> Interface"
>>> >>>      section. As
>>> >>>      >>>>> the
>>> >>>      >>>>>>>>> FLIP said, OperatorCoordinator is widely used in many
>>> >>>      connectors.
>>> >>>      >>>>>> Though
>>> >>>      >>>>>>>> it
>>> >>>      >>>>>>>>> is still an @Internal API, I think it is worth
>>> >>>      mentioning the
>>> >>>      >>>>>>>>> change
>>> >>>      >>>>> in
>>> >>>      >>>>>>>> the
>>> >>>      >>>>>>>>> FLIP.
>>> >>>      >>>>>>>> Since OperatorCoordinator is an internal API, it seems
>>> >>>      reasonable to
>>> >>>      >>>>>>>> explain it in the proposed change section. The FLIP
>>> seems
>>> >>>      to have
>>> >>>      >>>>>>>> documented this in point 5 of the proposed change
>>> section.
>>> >>>      >>>>>>>>
>>> >>>      >>>>>>>> BTW, if we think there are @internal classes that are
>>> >>>      important
>>> >>>      >>>>>>>> enough
>>> >>>      >>>>>> to
>>> >>>      >>>>>>>> be added in the public interface section, it might be
>>> >>>      useful to
>>> >>>      >>>>>> explicitly
>>> >>>      >>>>>>>> discuss this topic and document it in the "*What are
>>> the
>>> >>>      "public
>>> >>>      >>>>>>>> interfaces" of the project*" in this
>>> >>>      >>>>>>>> <
>>> >>>      >>>>>>>>
>>> >>>      >>
>>> >>>
>>> >>
>>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>>> >>>      >>>>>>>> wiki.
>>> >>>      >>>>>>>>
>>> >>>      >>>>>>>>
>>> >>>      >>>>>>>>> Best,
>>> >>>      >>>>>>>>> Jark
>>> >>>      >>>>>>>>>
>>> >>>      >>>>>>>>>
>>> >>>      >>>>>>>>> On Mon, 26 Dec 2022 at 18:06, Hang Ruan
>>> >>>      <[email protected]>
>>> >>>      >>>>>> wrote:
>>> >>>      >>>>>>>>>> Hi, thanks for the feedback, Zhu Zhu and Qingsheng.
>>> >>>      >>>>>>>>>> After combining everyone's comments, the main
>>> concerns and
>>> >>>      >>>>>>>> corresponding
>>> >>>      >>>>>>>>>> adjustments are as follows.
>>> >>>      >>>>>>>>>>
>>> >>>      >>>>>>>>>> Q1: Common metrics are not quite useful.
>>> >>>      >>>>>>>>>> numEventsIn and numEventsOut counters will be removed
>>> >>>      from the
>>> >>>      >>>>>>>>>> OperatorCoordinatorMetricGroup. These common metrics
>>> do
>>> >> not
>>> >>>      >>>>>>>>>> provide
>>> >>>      >>>>>>>>> enough
>>> >>>      >>>>>>>>>> information for users. The users are more willing to
>>> >>>      get the
>>> >>>      >>>>>>>>>> number
>>> >>>      >>>>> of
>>> >>>      >>>>>>>>>> events of the specified type instead of the total
>>> >>>      number. And this
>>> >>>      >>>>>>>> metric
>>> >>>      >>>>>>>>>> is calculated differently. The implementation could
>>> >>>      register the
>>> >>>      >>>>>> metric
>>> >>>      >>>>>>>>> by
>>> >>>      >>>>>>>>>> themselves.
>>> >>>      >>>>>>>>>>
>>> >>>      >>>>>>>>>> Q2: This FLIP is overly complicated.
>>> >>>      >>>>>>>>>> This FLIP will become concise after these
>>> modifications.
>>> >>>      >>>>>>>>>> OperatorCoordinatorMetricGroup has already been
>>> >>>      introduced into
>>> >>>      >>>>> Flink
>>> >>>      >>>>>>>> by
>>> >>>      >>>>>>>>>> FLIP-179<
>>> >>>      >>>>>>>>>>
>>> >>>      >>>>>>>>>>
>>> >>>      >>
>>> >>>
>>> >>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics
>>> >>>      >>>>>>>>>>> .
>>> >>>      >>>>>>>>>> And
>>> >>>      >>>>>>>>>> this FLIP will not change it. This FLIP only
>>> provides a
>>> >>>      new metric
>>> >>>      >>>>>>>> option
>>> >>>      >>>>>>>>>> and a new metric group scope. The changes in proposed
>>> >>>      changes
>>> >>>      >>>>> provide
>>> >>>      >>>>>>>> the
>>> >>>      >>>>>>>>>> details about the modifications for the internal
>>> >>>      classes, which
>>> >>>      >>>>> might
>>> >>>      >>>>>>>>> make
>>> >>>      >>>>>>>>>> it look complicated.
>>> >>>      >>>>>>>>>>
>>> >>>      >>>>>>>>>> Thanks for all the comments again. If there are no
>>> further
>>> >>>      >>>>>>>>>> comments,
>>> >>>      >>>>>> we
>>> >>>      >>>>>>>>>> plan to start the voting thread this week.
>>> >>>      >>>>>>>>>>
>>> >>>      >>>>>>>>>> Best,
>>> >>>      >>>>>>>>>> Hang
>>> >>>      >>>>>>>>>>
>>> >>>      >>>>>>>>>> Qingsheng Ren <[email protected]> 于2022年12月26日周一
>>> >>>      16:48写道：
>>> >>>      >>>>>>>>>>
>>> >>>      >>>>>>>>>>
>>> >>>      >>>>>>>>>>> Thanks for the FLIP, Hang!
>>> >>>      >>>>>>>>>>>
>>> >>>      >>>>>>>>>>> This FLIP overall looks good to me. Actually I share
>>> >>>      the same
>>> >>>      >>>>> concern
>>> >>>      >>>>>>>>>> with
>>> >>>      >>>>>>>>>>> Zhu that numEventsIn and numEventsOut counters are
>>> not
>>> >>>      quite
>>> >>>      >>>>>>>>>>> useful
>>> >>>      >>>>>>>> to
>>> >>>      >>>>>>>>>> end
>>> >>>      >>>>>>>>>>> users. OperatorEvent is a quite low-level
>>> abstraction,
>>> >>>      which
>>> >>>      >>>>> requires
>>> >>>      >>>>>>>>>>> instantialization in order to be practical to users
>>> and
>>> >>>      >>>>>>>>>>> developers,
>>> >>>      >>>>>>>> so
>>> >>>      >>>>>>>>>>> maybe it's better to exclude them from the FLIP.
>>> >>>      >>>>>>>>>>>
>>> >>>      >>>>>>>>>>> Best,
>>> >>>      >>>>>>>>>>> Qingsheng
>>> >>>      >>>>>>>>>>> Ververica (Alibaba)
>>> >>>      >>>>>>>>>>>
>>> >>>      >>>>>>>>>>> On Mon, Dec 26, 2022 at 12:08 PM Zhu Zhu
>>> >>>      <[email protected]>
>>> >>>      >>>>> wrote:
>>> >>>      >>>>>>>>>>>> Hi Hang,
>>> >>>      >>>>>>>>>>>> I still see no strong reason why we need
>>> >>>      >>>>>>>>>>>> numEventsIn/numEventsOut
>>> >>>      >>>>>>>>>>> metrics.
>>> >>>      >>>>>>>>>>>> In the discussion in FLINK-29801, I can see the
>>> same
>>> >>>      concern
>>> >>>      >>>>>>>>>>>> from
>>> >>>      >>>>>>>>>> others.
>>> >>>      >>>>>>>>>>>> So I prefer to exclude them from this FLIP to avoid
>>> >>>      >>>>>>>>>>>> over-extending
>>> >>>      >>>>>>>>> the
>>> >>>      >>>>>>>>>>>> scope.
>>> >>>      >>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>> Thanks,
>>> >>>      >>>>>>>>>>>> Zhu
>>> >>>      >>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>> Hang Ruan <[email protected]> 于2022年12月23日周五
>>> >>>      >>>>>>>>>>>> 15:21写道：
>>> >>>      >>>>>>>>>>>>> Hi everyone,
>>> >>>      >>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>> As for the Zhu Zhu's problem, I think we should
>>> keep
>>> >>>      the common
>>> >>>      >>>>>>>>>>> metrics,
>>> >>>      >>>>>>>>>>>> which will help to observe incoming and outgoing
>>> >>>      events. What do
>>> >>>      >>>>>>>> you
>>> >>>      >>>>>>>>>>> think,
>>> >>>      >>>>>>>>>>>> @Zhu Zhu ?
>>> >>>      >>>>>>>>>>>>> And @Chesnay, are there any other issues you are
>>> >>>      more concerned
>>> >>>      >>>>>>>>>> about?
>>> >>>      >>>>>>>>>>>> Looking forward to your reply.
>>> >>>      >>>>>>>>>>>>> Thanks for all the comments. If there are no
>>> further
>>> >>>      >>>>>>>>>>>>> comments, we
>>> >>>      >>>>>>>>>> plan
>>> >>>      >>>>>>>>>>>> to start the voting thread next week.
>>> >>>      >>>>>>>>>>>>> Best,
>>> >>>      >>>>>>>>>>>>> Hang
>>> >>>      >>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>> Hang Ruan <[email protected]> 于2022年12月15日周四
>>> >>>      >>>>>>>>>>>>> 16:49写道：
>>> >>>      >>>>>>>>>>>>>> Hi, Zhu Zhu,
>>> >>>      >>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>> Thanks for your feedback!
>>> >>>      >>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>> The OperatorCoordinator implementations are
>>> >>>      different. And
>>> >>>      >>>>>>>>>>>>>> their
>>> >>>      >>>>>>>>>>>> metrics are much different too. We try to find the
>>> >> common
>>> >>>      >>>>>>>>>>>> metrics
>>> >>>      >>>>>>>> and
>>> >>>      >>>>>>>>>> put
>>> >>>      >>>>>>>>>>>> them in the OperatorCoordinatorMetricGroup. If most
>>> >>>      developers
>>> >>>      >>>>>>>> think
>>> >>>      >>>>>>>>> we
>>> >>>      >>>>>>>>>>> do
>>> >>>      >>>>>>>>>>>> not need these common metrics, removing the common
>>> >>>      metrics is
>>> >>>      >>>>>>>>>>>> also
>>> >>>      >>>>>>>>>>>> acceptable.
>>> >>>      >>>>>>>>>>>>>> Best,
>>> >>>      >>>>>>>>>>>>>> Hang
>>> >>>      >>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>> Zhu Zhu <[email protected]> 于2022年12月14日周三
>>> >>>      22:09写道：
>>> >>>      >>>>>>>>>>>>>>> Hi Hang & MengYue,
>>> >>>      >>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>> Thanks for creating this FLIP!
>>> >>>      >>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>> I think it is very useful, mainly in two
>>> aspects:
>>> >>>      >>>>>>>>>>>>>>> 1. Enables OperatorCoordinators to register
>>> metrics.
>>> >>>      >>>>>>>>>>>>>>> Currently
>>> >>>      >>>>>>>>>>>>>>> the coordinators has no way to do this. And
>>> operator
>>> >>>      >>>>>>>> coordinator
>>> >>>      >>>>>>>>>>>>>>> metric group further enables the SplitEnumerator
>>> >>>      to have
>>> >>>      >>>>>>>>>>>>>>> access
>>> >>>      >>>>>>>>>>>>>>> to a registered metric group (via the existing
>>> public
>>> >>>      >>>>>>>>>>>>>>> interface
>>> >>>      >>>>>>>>>>>>>>> SplitEnumeratorContext#metricGroup()), which is
>>> >>>      null at the
>>> >>>      >>>>>>>>> moment.
>>> >>>      >>>>>>>>>>>>>>> 2. Defines the scope of operator coordinator
>>> >>>      metrics. A clear
>>> >>>      >>>>>>>>>>>> definition
>>> >>>      >>>>>>>>>>>>>>> makes it easy for users to find their wanted
>>> >>>      metrics. The
>>> >>>      >>>>>>>>>> definition
>>> >>>      >>>>>>>>>>>>>>> also helps to avoid conflicts of metrics from
>>> >> multiple
>>> >>>      >>>>>>>>>>>> OperatorCoordinators
>>> >>>      >>>>>>>>>>>>>>> of the same kind. E.g. each SourceCoordinator
>>> may
>>> >>>      have its
>>> >>>      >>>>>>>>>>>>>>> own
>>> >>>      >>>>>>>>>>>>>>> numSourceSplits metric, these metrics should not
>>> >>>      be directly
>>> >>>      >>>>>>>>>>> registered
>>> >>>      >>>>>>>>>>>>>>> to the job metric group.
>>> >>>      >>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>> What I'm a bit concerned is the necessity of the
>>> >>>      introduced
>>> >>>      >>>>>>>>> common
>>> >>>      >>>>>>>>>>>> metrics
>>> >>>      >>>>>>>>>>>>>>> numEventsInCounter & numEventsOutCounter. If
>>> there
>>> >>>      any case
>>> >>>      >>>>>>>> which
>>> >>>      >>>>>>>>>>>> strongly
>>> >>>      >>>>>>>>>>>>>>> requires them?
>>> >>>      >>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>> Regarding the concerns of Chesnay,
>>> >>>      >>>>>>>>>>>>>>>> A dedicated coordinator MG implementation is
>>> >> overkill
>>> >>>      >>>>>>>>>>>>>>> Directly using the job metric group can result
>>> in
>>> >>>      metric
>>> >>>      >>>>>>>>> conflicts,
>>> >>>      >>>>>>>>>>> as
>>> >>>      >>>>>>>>>>>> mentioned
>>> >>>      >>>>>>>>>>>>>>> in above #2.
>>> >>>      >>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>> Thanks,
>>> >>>      >>>>>>>>>>>>>>> Zhu
>>> >>>      >>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>> Dong Lin <[email protected]> 于2022年12月10日周六
>>> >>>      >>>>>>>>>>>>>>> 14:16写道：
>>> >>>      >>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>>> Hi Chesney,
>>> >>>      >>>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>>> Just to double check with you,
>>> >>>      >>>>>>>>>>>>>>>> OperatorCoordinatorMetricGroup
>>> >>>      >>>>>>>>>>>> (annotated as
>>> >>>      >>>>>>>>>>>>>>>> @PublicEvolving) has already been introduced
>>> into
>>> >>>      Flink by
>>> >>>      >>>>>>>>>> FLIP-179
>>> >>>      >>>>>>>>>>>>>>>> <
>>> >>>      >>
>>> >>>
>>> >>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics
>>> >>>      >>>>>>>>>>>>> .
>>> >>>      >>>>>>>>>>>>>>>> And that FLIP has got you +1.. Do you mean we
>>> >>>      should remove
>>> >>>      >>>>>>>>> this
>>> >>>      >>>>>>>>>>>>>>>> OperatorCoordinatorMetricGroup?
>>> >>>      >>>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>>> Regards,
>>> >>>      >>>>>>>>>>>>>>>> Dong
>>> >>>      >>>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>>> On Sat, Dec 10, 2022 at 1:33 AM Chesnay
>>> Schepler <
>>> >>>      >>>>>>>>>>> [email protected]>
>>> >>>      >>>>>>>>>>>> wrote:
>>> >>>      >>>>>>>>>>>>>>>>> As a whole I feel like this FLIP is overly
>>> >>>      complicated. A
>>> >>>      >>>>>>>>>>> dedicated
>>> >>>      >>>>>>>>>>>>>>>>> coordinator MG implementation is overkill; it
>>> >>>      could just
>>> >>>      >>>>>>>>> re-use
>>> >>>      >>>>>>>>>>> the
>>> >>>      >>>>>>>>>>>>>>>>> existing Task/OperatorMGs to create the same
>>> >>>      structure we
>>> >>>      >>>>>>>>> have
>>> >>>      >>>>>>>>>> on
>>> >>>      >>>>>>>>>>>> TMs,
>>> >>>      >>>>>>>>>>>>>>>>> similar to what we did with the Job MG.
>>> >>>      >>>>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>>>> However, I'm not convinced that this is
>>> required
>>> >>>      anyway,
>>> >>>      >>>>>>>>>> because
>>> >>>      >>>>>>>>>>>> all the
>>> >>>      >>>>>>>>>>>>>>>>> example metrics you listed can be implemented
>>> on
>>> >>>      the TM
>>> >>>      >>>>>>>> side
>>> >>>      >>>>>>>>> +
>>> >>>      >>>>>>>>>>>>>>>>> aggregating them in the external metrics
>>> backend.
>>> >>>      >>>>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>>>> Since I'm on holidays soon, just so no one
>>> tries
>>> >>>      to pull a
>>> >>>      >>>>>>>>> fast
>>> >>>      >>>>>>>>>>>> one on
>>> >>>      >>>>>>>>>>>>>>>>> me, if this were to go to a vote as-is I'd be
>>> >>>      against it.
>>> >>>      >>>>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>>>> On 09/12/2022 15:30, Dong Lin wrote:
>>> >>>      >>>>>>>>>>>>>>>>>> Hi Hang,
>>> >>>      >>>>>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>>>>> Thanks for the FLIP! The FLIP looks good and
>>> it
>>> >>>      is pretty
>>> >>>      >>>>>>>>>>>> informative.
>>> >>>      >>>>>>>>>>>>>>>>>> I have just two minor comments regarding
>>> names:
>>> >>>      >>>>>>>>>>>>>>>>>> - Would it be useful to rename the config
>>> key as
>>> >>>      >>>>>>>>>>>>>>>>>> *metrics.scope.jm.job.operator-coordinator*
>>> for
>>> >>>      >>>>>>>> consistency
>>> >>>      >>>>>>>>>>> with
>>> >>>      >>>>>>>>>>>>>>>>>> *metrics.scope.jm.job
>>> >>>      >>>>>>>>>>>>>>>>>> *(which is not named as *jm-job)?
>>> >>>      >>>>>>>>>>>>>>>>>> - Maybe rename the variable as
>>> >>>      >>>>>>>>>>> SCOPE_NAMING_OPERATOR_COORDINATOR
>>> >>>      >>>>>>>>>>>> for
>>> >>>      >>>>>>>>>>>>>>>>>> simplicity and consistency with
>>> >>>      SCOPE_NAMING_OPERATOR
>>> >>>      >>>>>>>>> (which
>>> >>>      >>>>>>>>>> is
>>> >>>      >>>>>>>>>>>> not named
>>> >>>      >>>>>>>>>>>>>>>>>> as SCOPE_NAMING_TM_JOB_OPERATOR)?
>>> >>>      >>>>>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>>>>> Cheers,
>>> >>>      >>>>>>>>>>>>>>>>>> Dong
>>> >>>      >>>>>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>>>>> On Thu, Dec 8, 2022 at 3:28 PM Hang Ruan <
>>> >>>      >>>>>>>>>>> [email protected]>
>>> >>>      >>>>>>>>>>>> wrote:
>>> >>>      >>>>>>>>>>>>>>>>>>> Hi all,
>>> >>>      >>>>>>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>>>>>> MengYue and I created FLIP-274[1] Introduce
>>> >>>      metric group
>>> >>>      >>>>>>>>> for
>>> >>>      >>>>>>>>>>>>>>>>>>> OperatorCoordinator. OperatorCoordinator is
>>> the
>>> >>>      >>>>>>>>> coordinator
>>> >>>      >>>>>>>>>>> for
>>> >>>      >>>>>>>>>>>> runtime
>>> >>>      >>>>>>>>>>>>>>>>>>> operators and running on Job Manager. The
>>> >>>      coordination
>>> >>>      >>>>>>>>>>>> mechanism is
>>> >>>      >>>>>>>>>>>>>>>>>>> operator events between OperatorCoordinator
>>> >>>      and its all
>>> >>>      >>>>>>>>>>>> operators, the
>>> >>>      >>>>>>>>>>>>>>>>>>> coordination is more and more using in
>>> Flink, for
>>> >>>      >>>>>>>> example
>>> >>>      >>>>>>>>>> many
>>> >>>      >>>>>>>>>>>> Sources
>>> >>>      >>>>>>>>>>>>>>>>> and
>>> >>>      >>>>>>>>>>>>>>>>>>> Sinks depend on the mechanism to assign
>>> splits
>>> >> and
>>> >>>      >>>>>>>>>> coordinate
>>> >>>      >>>>>>>>>>>> commits to
>>> >>>      >>>>>>>>>>>>>>>>>>> external systems. The OperatorCoordinator is
>>> >>>      widely
>>> >>>      >>>>>>>> using
>>> >>>      >>>>>>>>> in
>>> >>>      >>>>>>>>>>>> flink kafka
>>> >>>      >>>>>>>>>>>>>>>>>>> connector, flink pulsar connector, flink cdc
>>> >>>      connector,
>>> >>>      >>>>>>>>>> flink
>>> >>>      >>>>>>>>>>>> hudi
>>> >>>      >>>>>>>>>>>>>>>>>>> connector and so on.
>>> >>>      >>>>>>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>>>>>> But there is not a suitable metric group
>>> scope
>>> >>>      for the
>>> >>>      >>>>>>>>>>>>>>>>> OperatorCoordinator
>>> >>>      >>>>>>>>>>>>>>>>>>> and not an implementation for the interface
>>> >>>      >>>>>>>>>>>>>>>>> OperatorCoordinatorMetricGroup.
>>> >>>      >>>>>>>>>>>>>>>>>>> These metrics in OperatorCoordinator could
>>> be
>>> >>>      how many
>>> >>>      >>>>>>>>>>>> splits/partitions
>>> >>>      >>>>>>>>>>>>>>>>>>> have been assigned to source readers, how
>>> many
>>> >>>      files
>>> >>>      >>>>>>>> have
>>> >>>      >>>>>>>>>> been
>>> >>>      >>>>>>>>>>>> written
>>> >>>      >>>>>>>>>>>>>>>>> out
>>> >>>      >>>>>>>>>>>>>>>>>>> by sink writers, these metrics not only help
>>> >>>      users to
>>> >>>      >>>>>>>> know
>>> >>>      >>>>>>>>>> the
>>> >>>      >>>>>>>>>>>> job
>>> >>>      >>>>>>>>>>>>>>>>> progress
>>> >>>      >>>>>>>>>>>>>>>>>>> but also make big job maintaining easier.
>>> Thus we
>>> >>>      >>>>>>>> propose
>>> >>>      >>>>>>>>>> the
>>> >>>      >>>>>>>>>>>> FLIP-274
>>> >>>      >>>>>>>>>>>>>>>>> to
>>> >>>      >>>>>>>>>>>>>>>>>>> introduce a new metric group scope for
>>> >>>      >>>>>>>> OperatorCoordinator
>>> >>>      >>>>>>>>>> and
>>> >>>      >>>>>>>>>>>> provide
>>> >>>      >>>>>>>>>>>>>>>>> an
>>> >>>      >>>>>>>>>>>>>>>>>>> internal implementation for
>>> >>>      >>>>>>>>> OperatorCoordinatorMetricGroup.
>>> >>>      >>>>>>>>>>>>>>>>>>> Could you help review this FLIP when you get
>>> >>>      time? Any
>>> >>>      >>>>>>>>>>> feedback
>>> >>>      >>>>>>>>>>>> is
>>> >>>      >>>>>>>>>>>>>>>>>>> appreciated!
>>> >>>      >>>>>>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>>>>>> Best,
>>> >>>      >>>>>>>>>>>>>>>>>>> Hang
>>> >>>      >>>>>>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>>>>>> [1]
>>> >>>      >>>>>>>>>>>>>>>>>>>
>>> >>>      >>>>>>>>>>>>>>>>>>>
>>> >>>      >>
>>> >>>
>>> >>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-274%3A+Introduce+metric+group+for+OperatorCoordinator
>>> >>>      >>
>>> >>>
>>>
>>>
>>
>

Re: [DISCUSS] FLIP-274 : Introduce metric group for OperatorCoordinator

Reply via email to