Hi, Jark and Dong,

Thanks for your comments. Sorry for my late reply.

For suggestion 1, I plan to implement the SplitEnumeratorMetricGroup in
another issue, and it is not contained in this FLIP. I will add some
description about this part.
For suggestion 2, changes about OperatorCoordinator#metricGroup has already
been documented in the proposed change section.

Best,
Hang

Dong Lin <lindon...@gmail.com> 于2023年1月1日周日 09:45写道:

> Let me chime-in and add comments regarding the public interface section.
> Please see my comments inline.
>
> On Thu, Dec 29, 2022 at 6:08 PM Jark Wu <imj...@gmail.com> wrote:
>
> > Hi Hang,
> >
> > Thanks for driving this discussion. I think this is a very useful feature
> > for connectors.
> >
> > The FLIP looks quite good to me, and I just have two suggestions.
> >
> > 1. In the "Public Interface" section, mention that the implementation
> > behavior of "SplitEnumeratorContext#metricGroup" is changed from
> returning
> > null to returning a concrete SplitEnumeratorMetricGroup instance. Even
> > though the API is already there, the behavior change can also be
> considered
> > a public change.
> >
>
> SplitEnumeratorContext#metricGroup is an interface and this FLIP does not
> seem to change its semantics/behavior. The FLIP does change the
> implementation/behavior of SourceCoordinatorContext#metricGroup, which is
> marked @Internal.
>
> Thus it might seem a bit weird to add in the public interface section
> saying that we change the interface SplitEnumeratorContext#metricGroup from
> returning null to non-null object.
>
> 2. Mention the newly added interface of "OperatorCoordinator#metricGroup"
> > in the "Proposed Changes" section or "Public Interface" section. As the
> > FLIP said, OperatorCoordinator is widely used in many connectors. Though
> it
> > is still an @Internal API, I think it is worth mentioning the change in
> the
> > FLIP.
>
>
> Since OperatorCoordinator is an internal API, it seems reasonable to
> explain it in the proposed change section. The FLIP seems to have
> documented this in point 5 of the proposed change section.
>
> BTW, if we think there are @internal classes that are important enough to
> be added in the public interface section, it might be useful to explicitly
> discuss this topic and document it in the "*What are the "public
> interfaces" of the project*" in this
> <
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> >
> wiki.
>
>
> > Best,
> > Jark
> >
> >
> > On Mon, 26 Dec 2022 at 18:06, Hang Ruan <ruanhang1...@gmail.com> wrote:
> >
> > > Hi, thanks for the feedback, Zhu Zhu and Qingsheng.
> > > After combining everyone's comments, the main concerns and
> corresponding
> > > adjustments are as follows.
> > >
> > > Q1: Common metrics are not quite useful.
> > > numEventsIn and numEventsOut counters will be removed from the
> > > OperatorCoordinatorMetricGroup. These common metrics do not provide
> > enough
> > > information for users. The users are more willing to get the number of
> > > events of the specified type instead of the total number. And this
> metric
> > > is calculated differently. The implementation could register the metric
> > by
> > > themselves.
> > >
> > > Q2: This FLIP is overly complicated.
> > > This FLIP will become concise after these modifications.
> > > OperatorCoordinatorMetricGroup has already been introduced into Flink
> by
> > > FLIP-179<
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics
> > > >.
> > > And
> > > this FLIP will not change it. This FLIP only provides a new metric
> option
> > > and a new metric group scope. The changes in proposed changes provide
> the
> > > details about the modifications for the internal classes, which might
> > make
> > > it look complicated.
> > >
> > > Thanks for all the comments again. If there are no further comments, we
> > > plan to start the voting thread this week.
> > >
> > > Best,
> > > Hang
> > >
> > > Qingsheng Ren <renqs...@gmail.com> 于2022年12月26日周一 16:48写道:
> > >
> > > > Thanks for the FLIP, Hang!
> > > >
> > > > This FLIP overall looks good to me. Actually I share the same concern
> > > with
> > > > Zhu that numEventsIn and numEventsOut counters are not quite useful
> to
> > > end
> > > > users. OperatorEvent is a quite low-level abstraction, which requires
> > > > instantialization in order to be practical to users and developers,
> so
> > > > maybe it's better to exclude them from the FLIP.
> > > >
> > > > Best,
> > > > Qingsheng
> > > > Ververica (Alibaba)
> > > >
> > > > On Mon, Dec 26, 2022 at 12:08 PM Zhu Zhu <reed...@gmail.com> wrote:
> > > >
> > > > > Hi Hang,
> > > > > I still see no strong reason why we need numEventsIn/numEventsOut
> > > > metrics.
> > > > > In the discussion in FLINK-29801, I can see the same concern from
> > > others.
> > > > > So I prefer to exclude them from this FLIP to avoid over-extending
> > the
> > > > > scope.
> > > > >
> > > > > Thanks,
> > > > > Zhu
> > > > >
> > > > > Hang Ruan <ruanhang1...@gmail.com> 于2022年12月23日周五 15:21写道:
> > > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > As for the Zhu Zhu's problem, I think we should keep the common
> > > > metrics,
> > > > > which will help to observe incoming and outgoing events. What do
> you
> > > > think,
> > > > > @Zhu Zhu ?
> > > > > > And @Chesnay, are there any other issues you are more concerned
> > > about?
> > > > > Looking forward to your reply.
> > > > > >
> > > > > > Thanks for all the comments. If there are no further comments, we
> > > plan
> > > > > to start the voting thread next week.
> > > > > >
> > > > > > Best,
> > > > > > Hang
> > > > > >
> > > > > > Hang Ruan <ruanhang1...@gmail.com> 于2022年12月15日周四 16:49写道:
> > > > > >>
> > > > > >> Hi, Zhu Zhu,
> > > > > >>
> > > > > >> Thanks for your feedback!
> > > > > >>
> > > > > >> The OperatorCoordinator implementations are different. And their
> > > > > metrics are much different too. We try to find the common metrics
> and
> > > put
> > > > > them in the OperatorCoordinatorMetricGroup. If most developers
> think
> > we
> > > > do
> > > > > not need these common metrics, removing the common metrics is also
> > > > > acceptable.
> > > > > >>
> > > > > >> Best,
> > > > > >> Hang
> > > > > >>
> > > > > >> Zhu Zhu <reed...@gmail.com> 于2022年12月14日周三 22:09写道:
> > > > > >>>
> > > > > >>> Hi Hang & MengYue,
> > > > > >>>
> > > > > >>> Thanks for creating this FLIP!
> > > > > >>>
> > > > > >>> I think it is very useful, mainly in two aspects:
> > > > > >>> 1. Enables OperatorCoordinators to register metrics. Currently
> > > > > >>> the coordinators has no way to do this. And operator
> coordinator
> > > > > >>> metric group further enables the SplitEnumerator to have access
> > > > > >>> to a registered metric group (via the existing public interface
> > > > > >>> SplitEnumeratorContext#metricGroup()), which is null at the
> > moment.
> > > > > >>>
> > > > > >>> 2. Defines the scope of operator coordinator metrics. A clear
> > > > > definition
> > > > > >>> makes it easy for users to find their wanted metrics. The
> > > definition
> > > > > >>> also helps to avoid conflicts of metrics from multiple
> > > > > OperatorCoordinators
> > > > > >>> of the same kind. E.g. each SourceCoordinator may have its own
> > > > > >>> numSourceSplits metric, these metrics should not be directly
> > > > registered
> > > > > >>> to the job metric group.
> > > > > >>>
> > > > > >>> What I'm a bit concerned is the necessity of the introduced
> > common
> > > > > metrics
> > > > > >>> numEventsInCounter & numEventsOutCounter. If there any case
> which
> > > > > strongly
> > > > > >>> requires them?
> > > > > >>>
> > > > > >>> Regarding the concerns of Chesnay,
> > > > > >>> > A dedicated coordinator MG implementation is overkill
> > > > > >>> Directly using the job metric group can result in metric
> > conflicts,
> > > > as
> > > > > mentioned
> > > > > >>> in above #2.
> > > > > >>>
> > > > > >>> Thanks,
> > > > > >>> Zhu
> > > > > >>>
> > > > > >>> Dong Lin <lindon...@gmail.com> 于2022年12月10日周六 14:16写道:
> > > > > >>>
> > > > > >>> >
> > > > > >>> > Hi Chesney,
> > > > > >>> >
> > > > > >>> > Just to double check with you, OperatorCoordinatorMetricGroup
> > > > > (annotated as
> > > > > >>> > @PublicEvolving) has already been introduced into Flink by
> > > FLIP-179
> > > > > >>> > <
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics
> > > > > >.
> > > > > >>> > And that FLIP has got you +1.. Do you mean we should remove
> > this
> > > > > >>> > OperatorCoordinatorMetricGroup?
> > > > > >>> >
> > > > > >>> > Regards,
> > > > > >>> > Dong
> > > > > >>> >
> > > > > >>> > On Sat, Dec 10, 2022 at 1:33 AM Chesnay Schepler <
> > > > ches...@apache.org>
> > > > > wrote:
> > > > > >>> >
> > > > > >>> > > As a whole I feel like this FLIP is overly complicated. A
> > > > dedicated
> > > > > >>> > > coordinator MG implementation is overkill; it could just
> > re-use
> > > > the
> > > > > >>> > > existing Task/OperatorMGs to create the same structure we
> > have
> > > on
> > > > > TMs,
> > > > > >>> > > similar to what we did with the Job MG.
> > > > > >>> > >
> > > > > >>> > > However, I'm not convinced that this is required anyway,
> > > because
> > > > > all the
> > > > > >>> > > example metrics you listed can be implemented on the TM
> side
> > +
> > > > > >>> > > aggregating them in the external metrics backend.
> > > > > >>> > >
> > > > > >>> > > Since I'm on holidays soon, just so no one tries to pull a
> > fast
> > > > > one on
> > > > > >>> > > me, if this were to go to a vote as-is I'd be against it.
> > > > > >>> > >
> > > > > >>> > >
> > > > > >>> > > On 09/12/2022 15:30, Dong Lin wrote:
> > > > > >>> > > > Hi Hang,
> > > > > >>> > > >
> > > > > >>> > > > Thanks for the FLIP! The FLIP looks good and it is pretty
> > > > > informative.
> > > > > >>> > > >
> > > > > >>> > > > I have just two minor comments regarding names:
> > > > > >>> > > > - Would it be useful to rename the config key as
> > > > > >>> > > > *metrics.scope.jm.job.operator-coordinator* for
> consistency
> > > > with
> > > > > >>> > > > *metrics.scope.jm.job
> > > > > >>> > > > *(which is not named as *jm-job)?
> > > > > >>> > > > - Maybe rename the variable as
> > > > SCOPE_NAMING_OPERATOR_COORDINATOR
> > > > > for
> > > > > >>> > > > simplicity and consistency with SCOPE_NAMING_OPERATOR
> > (which
> > > is
> > > > > not named
> > > > > >>> > > > as SCOPE_NAMING_TM_JOB_OPERATOR)?
> > > > > >>> > > >
> > > > > >>> > > > Cheers,
> > > > > >>> > > > Dong
> > > > > >>> > > >
> > > > > >>> > > >
> > > > > >>> > > >
> > > > > >>> > > > On Thu, Dec 8, 2022 at 3:28 PM Hang Ruan <
> > > > ruanhang1...@gmail.com>
> > > > > wrote:
> > > > > >>> > > >
> > > > > >>> > > >> Hi all,
> > > > > >>> > > >>
> > > > > >>> > > >> MengYue and I created FLIP-274[1] Introduce metric group
> > for
> > > > > >>> > > >> OperatorCoordinator. OperatorCoordinator is the
> > coordinator
> > > > for
> > > > > runtime
> > > > > >>> > > >> operators and running on Job Manager. The coordination
> > > > > mechanism is
> > > > > >>> > > >> operator events between OperatorCoordinator and its all
> > > > > operators, the
> > > > > >>> > > >> coordination is more and more using in Flink, for
> example
> > > many
> > > > > Sources
> > > > > >>> > > and
> > > > > >>> > > >> Sinks depend on the mechanism to assign splits and
> > > coordinate
> > > > > commits to
> > > > > >>> > > >> external systems. The OperatorCoordinator is widely
> using
> > in
> > > > > flink kafka
> > > > > >>> > > >> connector, flink pulsar connector, flink cdc connector,
> > > flink
> > > > > hudi
> > > > > >>> > > >> connector and so on.
> > > > > >>> > > >>
> > > > > >>> > > >> But there is not a suitable metric group scope for the
> > > > > >>> > > OperatorCoordinator
> > > > > >>> > > >> and not an implementation for the interface
> > > > > >>> > > OperatorCoordinatorMetricGroup.
> > > > > >>> > > >> These metrics in OperatorCoordinator could be how many
> > > > > splits/partitions
> > > > > >>> > > >> have been assigned to source readers, how many files
> have
> > > been
> > > > > written
> > > > > >>> > > out
> > > > > >>> > > >> by sink writers, these metrics not only help users to
> know
> > > the
> > > > > job
> > > > > >>> > > progress
> > > > > >>> > > >> but also make big job maintaining easier. Thus we
> propose
> > > the
> > > > > FLIP-274
> > > > > >>> > > to
> > > > > >>> > > >> introduce a new metric group scope for
> OperatorCoordinator
> > > and
> > > > > provide
> > > > > >>> > > an
> > > > > >>> > > >> internal implementation for
> > OperatorCoordinatorMetricGroup.
> > > > > >>> > > >>
> > > > > >>> > > >> Could you help review this FLIP when you get time? Any
> > > > feedback
> > > > > is
> > > > > >>> > > >> appreciated!
> > > > > >>> > > >>
> > > > > >>> > > >> Best,
> > > > > >>> > > >> Hang
> > > > > >>> > > >>
> > > > > >>> > > >> [1]
> > > > > >>> > > >>
> > > > > >>> > > >>
> > > > > >>> > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-274%3A+Introduce+metric+group+for+OperatorCoordinator
> > > > > >>> > > >>
> > > > > >>> > >
> > > > > >>> > >
> > > > >
> > > >
> > >
> >
>

Reply via email to