Hi Hang,

I'm a bit surprised that this has gone to a vote, given that Chesnay
deliberately mentioned that he would vote against it as-is. I would expect
that before going to a vote, he has had the opportunity to participate in
this discussion.

Best regards,

Martijn

On Tue, Jan 3, 2023 at 12:53 PM Jark Wu <imj...@gmail.com> wrote:

> Hi Dong,
>
> Regarding “SplitEnumeratorContext#metricGroup”, my only concern is that
> this is a core interface for the FLIP. It’s hard to tell how sources use
> metric group without mentioning this interface. Even if this is an existing
> API, I think it’s worth introducing the interface again and declaring that
> we will implement the interface instead of a no-op method in this FLIP.
>
> Anyway, this is a minor problem and shouldn’t block this FLIP. I’m +1 to
> start a vote.
>
> Best,
> Jark
>
>
> > 2023年1月3日 10:03,Hang Ruan <ruanhang1...@gmail.com> 写道:
> >
> > Hi, Jark and Dong,
> >
> > Thanks for your comments. Sorry for my late reply.
> >
> > For suggestion 1, I plan to implement the SplitEnumeratorMetricGroup in
> > another issue, and it is not contained in this FLIP. I will add some
> > description about this part.
> > For suggestion 2, changes about OperatorCoordinator#metricGroup has
> already
> > been documented in the proposed change section.
> >
> > Best,
> > Hang
> >
> > Dong Lin <lindon...@gmail.com> 于2023年1月1日周日 09:45写道:
> >
> >> Let me chime-in and add comments regarding the public interface section.
> >> Please see my comments inline.
> >>
> >> On Thu, Dec 29, 2022 at 6:08 PM Jark Wu <imj...@gmail.com> wrote:
> >>
> >>> Hi Hang,
> >>>
> >>> Thanks for driving this discussion. I think this is a very useful
> feature
> >>> for connectors.
> >>>
> >>> The FLIP looks quite good to me, and I just have two suggestions.
> >>>
> >>> 1. In the "Public Interface" section, mention that the implementation
> >>> behavior of "SplitEnumeratorContext#metricGroup" is changed from
> >> returning
> >>> null to returning a concrete SplitEnumeratorMetricGroup instance. Even
> >>> though the API is already there, the behavior change can also be
> >> considered
> >>> a public change.
> >>>
> >>
> >> SplitEnumeratorContext#metricGroup is an interface and this FLIP does
> not
> >> seem to change its semantics/behavior. The FLIP does change the
> >> implementation/behavior of SourceCoordinatorContext#metricGroup, which
> is
> >> marked @Internal.
> >>
> >> Thus it might seem a bit weird to add in the public interface section
> >> saying that we change the interface SplitEnumeratorContext#metricGroup
> from
> >> returning null to non-null object.
> >>
> >> 2. Mention the newly added interface of
> "OperatorCoordinator#metricGroup"
> >>> in the "Proposed Changes" section or "Public Interface" section. As the
> >>> FLIP said, OperatorCoordinator is widely used in many connectors.
> Though
> >> it
> >>> is still an @Internal API, I think it is worth mentioning the change in
> >> the
> >>> FLIP.
> >>
> >>
> >> Since OperatorCoordinator is an internal API, it seems reasonable to
> >> explain it in the proposed change section. The FLIP seems to have
> >> documented this in point 5 of the proposed change section.
> >>
> >> BTW, if we think there are @internal classes that are important enough
> to
> >> be added in the public interface section, it might be useful to
> explicitly
> >> discuss this topic and document it in the "*What are the "public
> >> interfaces" of the project*" in this
> >> <
> >>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> >>>
> >> wiki.
> >>
> >>
> >>> Best,
> >>> Jark
> >>>
> >>>
> >>> On Mon, 26 Dec 2022 at 18:06, Hang Ruan <ruanhang1...@gmail.com>
> wrote:
> >>>
> >>>> Hi, thanks for the feedback, Zhu Zhu and Qingsheng.
> >>>> After combining everyone's comments, the main concerns and
> >> corresponding
> >>>> adjustments are as follows.
> >>>>
> >>>> Q1: Common metrics are not quite useful.
> >>>> numEventsIn and numEventsOut counters will be removed from the
> >>>> OperatorCoordinatorMetricGroup. These common metrics do not provide
> >>> enough
> >>>> information for users. The users are more willing to get the number of
> >>>> events of the specified type instead of the total number. And this
> >> metric
> >>>> is calculated differently. The implementation could register the
> metric
> >>> by
> >>>> themselves.
> >>>>
> >>>> Q2: This FLIP is overly complicated.
> >>>> This FLIP will become concise after these modifications.
> >>>> OperatorCoordinatorMetricGroup has already been introduced into Flink
> >> by
> >>>> FLIP-179<
> >>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics
> >>>>> .
> >>>> And
> >>>> this FLIP will not change it. This FLIP only provides a new metric
> >> option
> >>>> and a new metric group scope. The changes in proposed changes provide
> >> the
> >>>> details about the modifications for the internal classes, which might
> >>> make
> >>>> it look complicated.
> >>>>
> >>>> Thanks for all the comments again. If there are no further comments,
> we
> >>>> plan to start the voting thread this week.
> >>>>
> >>>> Best,
> >>>> Hang
> >>>>
> >>>> Qingsheng Ren <renqs...@gmail.com> 于2022年12月26日周一 16:48写道:
> >>>>
> >>>>> Thanks for the FLIP, Hang!
> >>>>>
> >>>>> This FLIP overall looks good to me. Actually I share the same concern
> >>>> with
> >>>>> Zhu that numEventsIn and numEventsOut counters are not quite useful
> >> to
> >>>> end
> >>>>> users. OperatorEvent is a quite low-level abstraction, which requires
> >>>>> instantialization in order to be practical to users and developers,
> >> so
> >>>>> maybe it's better to exclude them from the FLIP.
> >>>>>
> >>>>> Best,
> >>>>> Qingsheng
> >>>>> Ververica (Alibaba)
> >>>>>
> >>>>> On Mon, Dec 26, 2022 at 12:08 PM Zhu Zhu <reed...@gmail.com> wrote:
> >>>>>
> >>>>>> Hi Hang,
> >>>>>> I still see no strong reason why we need numEventsIn/numEventsOut
> >>>>> metrics.
> >>>>>> In the discussion in FLINK-29801, I can see the same concern from
> >>>> others.
> >>>>>> So I prefer to exclude them from this FLIP to avoid over-extending
> >>> the
> >>>>>> scope.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Zhu
> >>>>>>
> >>>>>> Hang Ruan <ruanhang1...@gmail.com> 于2022年12月23日周五 15:21写道:
> >>>>>>>
> >>>>>>> Hi everyone,
> >>>>>>>
> >>>>>>> As for the Zhu Zhu's problem, I think we should keep the common
> >>>>> metrics,
> >>>>>> which will help to observe incoming and outgoing events. What do
> >> you
> >>>>> think,
> >>>>>> @Zhu Zhu ?
> >>>>>>> And @Chesnay, are there any other issues you are more concerned
> >>>> about?
> >>>>>> Looking forward to your reply.
> >>>>>>>
> >>>>>>> Thanks for all the comments. If there are no further comments, we
> >>>> plan
> >>>>>> to start the voting thread next week.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Hang
> >>>>>>>
> >>>>>>> Hang Ruan <ruanhang1...@gmail.com> 于2022年12月15日周四 16:49写道:
> >>>>>>>>
> >>>>>>>> Hi, Zhu Zhu,
> >>>>>>>>
> >>>>>>>> Thanks for your feedback!
> >>>>>>>>
> >>>>>>>> The OperatorCoordinator implementations are different. And their
> >>>>>> metrics are much different too. We try to find the common metrics
> >> and
> >>>> put
> >>>>>> them in the OperatorCoordinatorMetricGroup. If most developers
> >> think
> >>> we
> >>>>> do
> >>>>>> not need these common metrics, removing the common metrics is also
> >>>>>> acceptable.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Hang
> >>>>>>>>
> >>>>>>>> Zhu Zhu <reed...@gmail.com> 于2022年12月14日周三 22:09写道:
> >>>>>>>>>
> >>>>>>>>> Hi Hang & MengYue,
> >>>>>>>>>
> >>>>>>>>> Thanks for creating this FLIP!
> >>>>>>>>>
> >>>>>>>>> I think it is very useful, mainly in two aspects:
> >>>>>>>>> 1. Enables OperatorCoordinators to register metrics. Currently
> >>>>>>>>> the coordinators has no way to do this. And operator
> >> coordinator
> >>>>>>>>> metric group further enables the SplitEnumerator to have access
> >>>>>>>>> to a registered metric group (via the existing public interface
> >>>>>>>>> SplitEnumeratorContext#metricGroup()), which is null at the
> >>> moment.
> >>>>>>>>>
> >>>>>>>>> 2. Defines the scope of operator coordinator metrics. A clear
> >>>>>> definition
> >>>>>>>>> makes it easy for users to find their wanted metrics. The
> >>>> definition
> >>>>>>>>> also helps to avoid conflicts of metrics from multiple
> >>>>>> OperatorCoordinators
> >>>>>>>>> of the same kind. E.g. each SourceCoordinator may have its own
> >>>>>>>>> numSourceSplits metric, these metrics should not be directly
> >>>>> registered
> >>>>>>>>> to the job metric group.
> >>>>>>>>>
> >>>>>>>>> What I'm a bit concerned is the necessity of the introduced
> >>> common
> >>>>>> metrics
> >>>>>>>>> numEventsInCounter & numEventsOutCounter. If there any case
> >> which
> >>>>>> strongly
> >>>>>>>>> requires them?
> >>>>>>>>>
> >>>>>>>>> Regarding the concerns of Chesnay,
> >>>>>>>>>> A dedicated coordinator MG implementation is overkill
> >>>>>>>>> Directly using the job metric group can result in metric
> >>> conflicts,
> >>>>> as
> >>>>>> mentioned
> >>>>>>>>> in above #2.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Zhu
> >>>>>>>>>
> >>>>>>>>> Dong Lin <lindon...@gmail.com> 于2022年12月10日周六 14:16写道:
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Hi Chesney,
> >>>>>>>>>>
> >>>>>>>>>> Just to double check with you, OperatorCoordinatorMetricGroup
> >>>>>> (annotated as
> >>>>>>>>>> @PublicEvolving) has already been introduced into Flink by
> >>>> FLIP-179
> >>>>>>>>>> <
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics
> >>>>>>> .
> >>>>>>>>>> And that FLIP has got you +1.. Do you mean we should remove
> >>> this
> >>>>>>>>>> OperatorCoordinatorMetricGroup?
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Dong
> >>>>>>>>>>
> >>>>>>>>>> On Sat, Dec 10, 2022 at 1:33 AM Chesnay Schepler <
> >>>>> ches...@apache.org>
> >>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> As a whole I feel like this FLIP is overly complicated. A
> >>>>> dedicated
> >>>>>>>>>>> coordinator MG implementation is overkill; it could just
> >>> re-use
> >>>>> the
> >>>>>>>>>>> existing Task/OperatorMGs to create the same structure we
> >>> have
> >>>> on
> >>>>>> TMs,
> >>>>>>>>>>> similar to what we did with the Job MG.
> >>>>>>>>>>>
> >>>>>>>>>>> However, I'm not convinced that this is required anyway,
> >>>> because
> >>>>>> all the
> >>>>>>>>>>> example metrics you listed can be implemented on the TM
> >> side
> >>> +
> >>>>>>>>>>> aggregating them in the external metrics backend.
> >>>>>>>>>>>
> >>>>>>>>>>> Since I'm on holidays soon, just so no one tries to pull a
> >>> fast
> >>>>>> one on
> >>>>>>>>>>> me, if this were to go to a vote as-is I'd be against it.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 09/12/2022 15:30, Dong Lin wrote:
> >>>>>>>>>>>> Hi Hang,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks for the FLIP! The FLIP looks good and it is pretty
> >>>>>> informative.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I have just two minor comments regarding names:
> >>>>>>>>>>>> - Would it be useful to rename the config key as
> >>>>>>>>>>>> *metrics.scope.jm.job.operator-coordinator* for
> >> consistency
> >>>>> with
> >>>>>>>>>>>> *metrics.scope.jm.job
> >>>>>>>>>>>> *(which is not named as *jm-job)?
> >>>>>>>>>>>> - Maybe rename the variable as
> >>>>> SCOPE_NAMING_OPERATOR_COORDINATOR
> >>>>>> for
> >>>>>>>>>>>> simplicity and consistency with SCOPE_NAMING_OPERATOR
> >>> (which
> >>>> is
> >>>>>> not named
> >>>>>>>>>>>> as SCOPE_NAMING_TM_JOB_OPERATOR)?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Cheers,
> >>>>>>>>>>>> Dong
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, Dec 8, 2022 at 3:28 PM Hang Ruan <
> >>>>> ruanhang1...@gmail.com>
> >>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> MengYue and I created FLIP-274[1] Introduce metric group
> >>> for
> >>>>>>>>>>>>> OperatorCoordinator. OperatorCoordinator is the
> >>> coordinator
> >>>>> for
> >>>>>> runtime
> >>>>>>>>>>>>> operators and running on Job Manager. The coordination
> >>>>>> mechanism is
> >>>>>>>>>>>>> operator events between OperatorCoordinator and its all
> >>>>>> operators, the
> >>>>>>>>>>>>> coordination is more and more using in Flink, for
> >> example
> >>>> many
> >>>>>> Sources
> >>>>>>>>>>> and
> >>>>>>>>>>>>> Sinks depend on the mechanism to assign splits and
> >>>> coordinate
> >>>>>> commits to
> >>>>>>>>>>>>> external systems. The OperatorCoordinator is widely
> >> using
> >>> in
> >>>>>> flink kafka
> >>>>>>>>>>>>> connector, flink pulsar connector, flink cdc connector,
> >>>> flink
> >>>>>> hudi
> >>>>>>>>>>>>> connector and so on.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> But there is not a suitable metric group scope for the
> >>>>>>>>>>> OperatorCoordinator
> >>>>>>>>>>>>> and not an implementation for the interface
> >>>>>>>>>>> OperatorCoordinatorMetricGroup.
> >>>>>>>>>>>>> These metrics in OperatorCoordinator could be how many
> >>>>>> splits/partitions
> >>>>>>>>>>>>> have been assigned to source readers, how many files
> >> have
> >>>> been
> >>>>>> written
> >>>>>>>>>>> out
> >>>>>>>>>>>>> by sink writers, these metrics not only help users to
> >> know
> >>>> the
> >>>>>> job
> >>>>>>>>>>> progress
> >>>>>>>>>>>>> but also make big job maintaining easier. Thus we
> >> propose
> >>>> the
> >>>>>> FLIP-274
> >>>>>>>>>>> to
> >>>>>>>>>>>>> introduce a new metric group scope for
> >> OperatorCoordinator
> >>>> and
> >>>>>> provide
> >>>>>>>>>>> an
> >>>>>>>>>>>>> internal implementation for
> >>> OperatorCoordinatorMetricGroup.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Could you help review this FLIP when you get time? Any
> >>>>> feedback
> >>>>>> is
> >>>>>>>>>>>>> appreciated!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>>> Hang
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-274%3A+Introduce+metric+group+for+OperatorCoordinator
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Reply via email to