Hi, thanks for the feedback, Zhu Zhu and Qingsheng.
After combining everyone's comments, the main concerns and corresponding
adjustments are as follows.

Q1: Common metrics are not quite useful.
numEventsIn and numEventsOut counters will be removed from the
OperatorCoordinatorMetricGroup. These common metrics do not provide enough
information for users. The users are more willing to get the number of
events of the specified type instead of the total number. And this metric
is calculated differently. The implementation could register the metric by
themselves.

Q2: This FLIP is overly complicated.
This FLIP will become concise after these modifications.
OperatorCoordinatorMetricGroup has already been introduced into Flink by
FLIP-179<
https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics>.
And
this FLIP will not change it. This FLIP only provides a new metric option
and a new metric group scope. The changes in proposed changes provide the
details about the modifications for the internal classes, which might make
it look complicated.

Thanks for all the comments again. If there are no further comments, we
plan to start the voting thread this week.

Best,
Hang

Qingsheng Ren <renqs...@gmail.com> 于2022年12月26日周一 16:48写道:

> Thanks for the FLIP, Hang!
>
> This FLIP overall looks good to me. Actually I share the same concern with
> Zhu that numEventsIn and numEventsOut counters are not quite useful to end
> users. OperatorEvent is a quite low-level abstraction, which requires
> instantialization in order to be practical to users and developers, so
> maybe it's better to exclude them from the FLIP.
>
> Best,
> Qingsheng
> Ververica (Alibaba)
>
> On Mon, Dec 26, 2022 at 12:08 PM Zhu Zhu <reed...@gmail.com> wrote:
>
> > Hi Hang,
> > I still see no strong reason why we need numEventsIn/numEventsOut
> metrics.
> > In the discussion in FLINK-29801, I can see the same concern from others.
> > So I prefer to exclude them from this FLIP to avoid over-extending the
> > scope.
> >
> > Thanks,
> > Zhu
> >
> > Hang Ruan <ruanhang1...@gmail.com> 于2022年12月23日周五 15:21写道:
> > >
> > > Hi everyone,
> > >
> > > As for the Zhu Zhu's problem, I think we should keep the common
> metrics,
> > which will help to observe incoming and outgoing events. What do you
> think,
> > @Zhu Zhu ?
> > > And @Chesnay, are there any other issues you are more concerned about?
> > Looking forward to your reply.
> > >
> > > Thanks for all the comments. If there are no further comments, we plan
> > to start the voting thread next week.
> > >
> > > Best,
> > > Hang
> > >
> > > Hang Ruan <ruanhang1...@gmail.com> 于2022年12月15日周四 16:49写道:
> > >>
> > >> Hi, Zhu Zhu,
> > >>
> > >> Thanks for your feedback!
> > >>
> > >> The OperatorCoordinator implementations are different. And their
> > metrics are much different too. We try to find the common metrics and put
> > them in the OperatorCoordinatorMetricGroup. If most developers think we
> do
> > not need these common metrics, removing the common metrics is also
> > acceptable.
> > >>
> > >> Best,
> > >> Hang
> > >>
> > >> Zhu Zhu <reed...@gmail.com> 于2022年12月14日周三 22:09写道:
> > >>>
> > >>> Hi Hang & MengYue,
> > >>>
> > >>> Thanks for creating this FLIP!
> > >>>
> > >>> I think it is very useful, mainly in two aspects:
> > >>> 1. Enables OperatorCoordinators to register metrics. Currently
> > >>> the coordinators has no way to do this. And operator coordinator
> > >>> metric group further enables the SplitEnumerator to have access
> > >>> to a registered metric group (via the existing public interface
> > >>> SplitEnumeratorContext#metricGroup()), which is null at the moment.
> > >>>
> > >>> 2. Defines the scope of operator coordinator metrics. A clear
> > definition
> > >>> makes it easy for users to find their wanted metrics. The definition
> > >>> also helps to avoid conflicts of metrics from multiple
> > OperatorCoordinators
> > >>> of the same kind. E.g. each SourceCoordinator may have its own
> > >>> numSourceSplits metric, these metrics should not be directly
> registered
> > >>> to the job metric group.
> > >>>
> > >>> What I'm a bit concerned is the necessity of the introduced common
> > metrics
> > >>> numEventsInCounter & numEventsOutCounter. If there any case which
> > strongly
> > >>> requires them?
> > >>>
> > >>> Regarding the concerns of Chesnay,
> > >>> > A dedicated coordinator MG implementation is overkill
> > >>> Directly using the job metric group can result in metric conflicts,
> as
> > mentioned
> > >>> in above #2.
> > >>>
> > >>> Thanks,
> > >>> Zhu
> > >>>
> > >>> Dong Lin <lindon...@gmail.com> 于2022年12月10日周六 14:16写道:
> > >>>
> > >>> >
> > >>> > Hi Chesney,
> > >>> >
> > >>> > Just to double check with you, OperatorCoordinatorMetricGroup
> > (annotated as
> > >>> > @PublicEvolving) has already been introduced into Flink by FLIP-179
> > >>> > <
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics
> > >.
> > >>> > And that FLIP has got you +1.. Do you mean we should remove this
> > >>> > OperatorCoordinatorMetricGroup?
> > >>> >
> > >>> > Regards,
> > >>> > Dong
> > >>> >
> > >>> > On Sat, Dec 10, 2022 at 1:33 AM Chesnay Schepler <
> ches...@apache.org>
> > wrote:
> > >>> >
> > >>> > > As a whole I feel like this FLIP is overly complicated. A
> dedicated
> > >>> > > coordinator MG implementation is overkill; it could just re-use
> the
> > >>> > > existing Task/OperatorMGs to create the same structure we have on
> > TMs,
> > >>> > > similar to what we did with the Job MG.
> > >>> > >
> > >>> > > However, I'm not convinced that this is required anyway, because
> > all the
> > >>> > > example metrics you listed can be implemented on the TM side +
> > >>> > > aggregating them in the external metrics backend.
> > >>> > >
> > >>> > > Since I'm on holidays soon, just so no one tries to pull a fast
> > one on
> > >>> > > me, if this were to go to a vote as-is I'd be against it.
> > >>> > >
> > >>> > >
> > >>> > > On 09/12/2022 15:30, Dong Lin wrote:
> > >>> > > > Hi Hang,
> > >>> > > >
> > >>> > > > Thanks for the FLIP! The FLIP looks good and it is pretty
> > informative.
> > >>> > > >
> > >>> > > > I have just two minor comments regarding names:
> > >>> > > > - Would it be useful to rename the config key as
> > >>> > > > *metrics.scope.jm.job.operator-coordinator* for consistency
> with
> > >>> > > > *metrics.scope.jm.job
> > >>> > > > *(which is not named as *jm-job)?
> > >>> > > > - Maybe rename the variable as
> SCOPE_NAMING_OPERATOR_COORDINATOR
> > for
> > >>> > > > simplicity and consistency with SCOPE_NAMING_OPERATOR (which is
> > not named
> > >>> > > > as SCOPE_NAMING_TM_JOB_OPERATOR)?
> > >>> > > >
> > >>> > > > Cheers,
> > >>> > > > Dong
> > >>> > > >
> > >>> > > >
> > >>> > > >
> > >>> > > > On Thu, Dec 8, 2022 at 3:28 PM Hang Ruan <
> ruanhang1...@gmail.com>
> > wrote:
> > >>> > > >
> > >>> > > >> Hi all,
> > >>> > > >>
> > >>> > > >> MengYue and I created FLIP-274[1] Introduce metric group for
> > >>> > > >> OperatorCoordinator. OperatorCoordinator is the coordinator
> for
> > runtime
> > >>> > > >> operators and running on Job Manager. The coordination
> > mechanism is
> > >>> > > >> operator events between OperatorCoordinator and its all
> > operators, the
> > >>> > > >> coordination is more and more using in Flink, for example many
> > Sources
> > >>> > > and
> > >>> > > >> Sinks depend on the mechanism to assign splits and coordinate
> > commits to
> > >>> > > >> external systems. The OperatorCoordinator is widely using in
> > flink kafka
> > >>> > > >> connector, flink pulsar connector, flink cdc connector, flink
> > hudi
> > >>> > > >> connector and so on.
> > >>> > > >>
> > >>> > > >> But there is not a suitable metric group scope for the
> > >>> > > OperatorCoordinator
> > >>> > > >> and not an implementation for the interface
> > >>> > > OperatorCoordinatorMetricGroup.
> > >>> > > >> These metrics in OperatorCoordinator could be how many
> > splits/partitions
> > >>> > > >> have been assigned to source readers, how many files have been
> > written
> > >>> > > out
> > >>> > > >> by sink writers, these metrics not only help users to know the
> > job
> > >>> > > progress
> > >>> > > >> but also make big job maintaining easier. Thus we propose the
> > FLIP-274
> > >>> > > to
> > >>> > > >> introduce a new metric group scope for OperatorCoordinator and
> > provide
> > >>> > > an
> > >>> > > >> internal implementation for OperatorCoordinatorMetricGroup.
> > >>> > > >>
> > >>> > > >> Could you help review this FLIP when you get time? Any
> feedback
> > is
> > >>> > > >> appreciated!
> > >>> > > >>
> > >>> > > >> Best,
> > >>> > > >> Hang
> > >>> > > >>
> > >>> > > >> [1]
> > >>> > > >>
> > >>> > > >>
> > >>> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-274%3A+Introduce+metric+group+for+OperatorCoordinator
> > >>> > > >>
> > >>> > >
> > >>> > >
> >
>

Reply via email to