Hi Hang, I'm a bit surprised that this has gone to a vote, given that Chesnay deliberately mentioned that he would vote against it as-is. I would expect that before going to a vote, he has had the opportunity to participate in this discussion.
Best regards, Martijn On Tue, Jan 3, 2023 at 12:53 PM Jark Wu <imj...@gmail.com> wrote: > Hi Dong, > > Regarding “SplitEnumeratorContext#metricGroup”, my only concern is that > this is a core interface for the FLIP. It’s hard to tell how sources use > metric group without mentioning this interface. Even if this is an existing > API, I think it’s worth introducing the interface again and declaring that > we will implement the interface instead of a no-op method in this FLIP. > > Anyway, this is a minor problem and shouldn’t block this FLIP. I’m +1 to > start a vote. > > Best, > Jark > > > > 2023年1月3日 10:03,Hang Ruan <ruanhang1...@gmail.com> 写道: > > > > Hi, Jark and Dong, > > > > Thanks for your comments. Sorry for my late reply. > > > > For suggestion 1, I plan to implement the SplitEnumeratorMetricGroup in > > another issue, and it is not contained in this FLIP. I will add some > > description about this part. > > For suggestion 2, changes about OperatorCoordinator#metricGroup has > already > > been documented in the proposed change section. > > > > Best, > > Hang > > > > Dong Lin <lindon...@gmail.com> 于2023年1月1日周日 09:45写道: > > > >> Let me chime-in and add comments regarding the public interface section. > >> Please see my comments inline. > >> > >> On Thu, Dec 29, 2022 at 6:08 PM Jark Wu <imj...@gmail.com> wrote: > >> > >>> Hi Hang, > >>> > >>> Thanks for driving this discussion. I think this is a very useful > feature > >>> for connectors. > >>> > >>> The FLIP looks quite good to me, and I just have two suggestions. > >>> > >>> 1. In the "Public Interface" section, mention that the implementation > >>> behavior of "SplitEnumeratorContext#metricGroup" is changed from > >> returning > >>> null to returning a concrete SplitEnumeratorMetricGroup instance. Even > >>> though the API is already there, the behavior change can also be > >> considered > >>> a public change. > >>> > >> > >> SplitEnumeratorContext#metricGroup is an interface and this FLIP does > not > >> seem to change its semantics/behavior. The FLIP does change the > >> implementation/behavior of SourceCoordinatorContext#metricGroup, which > is > >> marked @Internal. > >> > >> Thus it might seem a bit weird to add in the public interface section > >> saying that we change the interface SplitEnumeratorContext#metricGroup > from > >> returning null to non-null object. > >> > >> 2. Mention the newly added interface of > "OperatorCoordinator#metricGroup" > >>> in the "Proposed Changes" section or "Public Interface" section. As the > >>> FLIP said, OperatorCoordinator is widely used in many connectors. > Though > >> it > >>> is still an @Internal API, I think it is worth mentioning the change in > >> the > >>> FLIP. > >> > >> > >> Since OperatorCoordinator is an internal API, it seems reasonable to > >> explain it in the proposed change section. The FLIP seems to have > >> documented this in point 5 of the proposed change section. > >> > >> BTW, if we think there are @internal classes that are important enough > to > >> be added in the public interface section, it might be useful to > explicitly > >> discuss this topic and document it in the "*What are the "public > >> interfaces" of the project*" in this > >> < > >> > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > >>> > >> wiki. > >> > >> > >>> Best, > >>> Jark > >>> > >>> > >>> On Mon, 26 Dec 2022 at 18:06, Hang Ruan <ruanhang1...@gmail.com> > wrote: > >>> > >>>> Hi, thanks for the feedback, Zhu Zhu and Qingsheng. > >>>> After combining everyone's comments, the main concerns and > >> corresponding > >>>> adjustments are as follows. > >>>> > >>>> Q1: Common metrics are not quite useful. > >>>> numEventsIn and numEventsOut counters will be removed from the > >>>> OperatorCoordinatorMetricGroup. These common metrics do not provide > >>> enough > >>>> information for users. The users are more willing to get the number of > >>>> events of the specified type instead of the total number. And this > >> metric > >>>> is calculated differently. The implementation could register the > metric > >>> by > >>>> themselves. > >>>> > >>>> Q2: This FLIP is overly complicated. > >>>> This FLIP will become concise after these modifications. > >>>> OperatorCoordinatorMetricGroup has already been introduced into Flink > >> by > >>>> FLIP-179< > >>>> > >>>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics > >>>>> . > >>>> And > >>>> this FLIP will not change it. This FLIP only provides a new metric > >> option > >>>> and a new metric group scope. The changes in proposed changes provide > >> the > >>>> details about the modifications for the internal classes, which might > >>> make > >>>> it look complicated. > >>>> > >>>> Thanks for all the comments again. If there are no further comments, > we > >>>> plan to start the voting thread this week. > >>>> > >>>> Best, > >>>> Hang > >>>> > >>>> Qingsheng Ren <renqs...@gmail.com> 于2022年12月26日周一 16:48写道: > >>>> > >>>>> Thanks for the FLIP, Hang! > >>>>> > >>>>> This FLIP overall looks good to me. Actually I share the same concern > >>>> with > >>>>> Zhu that numEventsIn and numEventsOut counters are not quite useful > >> to > >>>> end > >>>>> users. OperatorEvent is a quite low-level abstraction, which requires > >>>>> instantialization in order to be practical to users and developers, > >> so > >>>>> maybe it's better to exclude them from the FLIP. > >>>>> > >>>>> Best, > >>>>> Qingsheng > >>>>> Ververica (Alibaba) > >>>>> > >>>>> On Mon, Dec 26, 2022 at 12:08 PM Zhu Zhu <reed...@gmail.com> wrote: > >>>>> > >>>>>> Hi Hang, > >>>>>> I still see no strong reason why we need numEventsIn/numEventsOut > >>>>> metrics. > >>>>>> In the discussion in FLINK-29801, I can see the same concern from > >>>> others. > >>>>>> So I prefer to exclude them from this FLIP to avoid over-extending > >>> the > >>>>>> scope. > >>>>>> > >>>>>> Thanks, > >>>>>> Zhu > >>>>>> > >>>>>> Hang Ruan <ruanhang1...@gmail.com> 于2022年12月23日周五 15:21写道: > >>>>>>> > >>>>>>> Hi everyone, > >>>>>>> > >>>>>>> As for the Zhu Zhu's problem, I think we should keep the common > >>>>> metrics, > >>>>>> which will help to observe incoming and outgoing events. What do > >> you > >>>>> think, > >>>>>> @Zhu Zhu ? > >>>>>>> And @Chesnay, are there any other issues you are more concerned > >>>> about? > >>>>>> Looking forward to your reply. > >>>>>>> > >>>>>>> Thanks for all the comments. If there are no further comments, we > >>>> plan > >>>>>> to start the voting thread next week. > >>>>>>> > >>>>>>> Best, > >>>>>>> Hang > >>>>>>> > >>>>>>> Hang Ruan <ruanhang1...@gmail.com> 于2022年12月15日周四 16:49写道: > >>>>>>>> > >>>>>>>> Hi, Zhu Zhu, > >>>>>>>> > >>>>>>>> Thanks for your feedback! > >>>>>>>> > >>>>>>>> The OperatorCoordinator implementations are different. And their > >>>>>> metrics are much different too. We try to find the common metrics > >> and > >>>> put > >>>>>> them in the OperatorCoordinatorMetricGroup. If most developers > >> think > >>> we > >>>>> do > >>>>>> not need these common metrics, removing the common metrics is also > >>>>>> acceptable. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Hang > >>>>>>>> > >>>>>>>> Zhu Zhu <reed...@gmail.com> 于2022年12月14日周三 22:09写道: > >>>>>>>>> > >>>>>>>>> Hi Hang & MengYue, > >>>>>>>>> > >>>>>>>>> Thanks for creating this FLIP! > >>>>>>>>> > >>>>>>>>> I think it is very useful, mainly in two aspects: > >>>>>>>>> 1. Enables OperatorCoordinators to register metrics. Currently > >>>>>>>>> the coordinators has no way to do this. And operator > >> coordinator > >>>>>>>>> metric group further enables the SplitEnumerator to have access > >>>>>>>>> to a registered metric group (via the existing public interface > >>>>>>>>> SplitEnumeratorContext#metricGroup()), which is null at the > >>> moment. > >>>>>>>>> > >>>>>>>>> 2. Defines the scope of operator coordinator metrics. A clear > >>>>>> definition > >>>>>>>>> makes it easy for users to find their wanted metrics. The > >>>> definition > >>>>>>>>> also helps to avoid conflicts of metrics from multiple > >>>>>> OperatorCoordinators > >>>>>>>>> of the same kind. E.g. each SourceCoordinator may have its own > >>>>>>>>> numSourceSplits metric, these metrics should not be directly > >>>>> registered > >>>>>>>>> to the job metric group. > >>>>>>>>> > >>>>>>>>> What I'm a bit concerned is the necessity of the introduced > >>> common > >>>>>> metrics > >>>>>>>>> numEventsInCounter & numEventsOutCounter. If there any case > >> which > >>>>>> strongly > >>>>>>>>> requires them? > >>>>>>>>> > >>>>>>>>> Regarding the concerns of Chesnay, > >>>>>>>>>> A dedicated coordinator MG implementation is overkill > >>>>>>>>> Directly using the job metric group can result in metric > >>> conflicts, > >>>>> as > >>>>>> mentioned > >>>>>>>>> in above #2. > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Zhu > >>>>>>>>> > >>>>>>>>> Dong Lin <lindon...@gmail.com> 于2022年12月10日周六 14:16写道: > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Hi Chesney, > >>>>>>>>>> > >>>>>>>>>> Just to double check with you, OperatorCoordinatorMetricGroup > >>>>>> (annotated as > >>>>>>>>>> @PublicEvolving) has already been introduced into Flink by > >>>> FLIP-179 > >>>>>>>>>> < > >>>>>> > >>>>> > >>>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics > >>>>>>> . > >>>>>>>>>> And that FLIP has got you +1.. Do you mean we should remove > >>> this > >>>>>>>>>> OperatorCoordinatorMetricGroup? > >>>>>>>>>> > >>>>>>>>>> Regards, > >>>>>>>>>> Dong > >>>>>>>>>> > >>>>>>>>>> On Sat, Dec 10, 2022 at 1:33 AM Chesnay Schepler < > >>>>> ches...@apache.org> > >>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> As a whole I feel like this FLIP is overly complicated. A > >>>>> dedicated > >>>>>>>>>>> coordinator MG implementation is overkill; it could just > >>> re-use > >>>>> the > >>>>>>>>>>> existing Task/OperatorMGs to create the same structure we > >>> have > >>>> on > >>>>>> TMs, > >>>>>>>>>>> similar to what we did with the Job MG. > >>>>>>>>>>> > >>>>>>>>>>> However, I'm not convinced that this is required anyway, > >>>> because > >>>>>> all the > >>>>>>>>>>> example metrics you listed can be implemented on the TM > >> side > >>> + > >>>>>>>>>>> aggregating them in the external metrics backend. > >>>>>>>>>>> > >>>>>>>>>>> Since I'm on holidays soon, just so no one tries to pull a > >>> fast > >>>>>> one on > >>>>>>>>>>> me, if this were to go to a vote as-is I'd be against it. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 09/12/2022 15:30, Dong Lin wrote: > >>>>>>>>>>>> Hi Hang, > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks for the FLIP! The FLIP looks good and it is pretty > >>>>>> informative. > >>>>>>>>>>>> > >>>>>>>>>>>> I have just two minor comments regarding names: > >>>>>>>>>>>> - Would it be useful to rename the config key as > >>>>>>>>>>>> *metrics.scope.jm.job.operator-coordinator* for > >> consistency > >>>>> with > >>>>>>>>>>>> *metrics.scope.jm.job > >>>>>>>>>>>> *(which is not named as *jm-job)? > >>>>>>>>>>>> - Maybe rename the variable as > >>>>> SCOPE_NAMING_OPERATOR_COORDINATOR > >>>>>> for > >>>>>>>>>>>> simplicity and consistency with SCOPE_NAMING_OPERATOR > >>> (which > >>>> is > >>>>>> not named > >>>>>>>>>>>> as SCOPE_NAMING_TM_JOB_OPERATOR)? > >>>>>>>>>>>> > >>>>>>>>>>>> Cheers, > >>>>>>>>>>>> Dong > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Thu, Dec 8, 2022 at 3:28 PM Hang Ruan < > >>>>> ruanhang1...@gmail.com> > >>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Hi all, > >>>>>>>>>>>>> > >>>>>>>>>>>>> MengYue and I created FLIP-274[1] Introduce metric group > >>> for > >>>>>>>>>>>>> OperatorCoordinator. OperatorCoordinator is the > >>> coordinator > >>>>> for > >>>>>> runtime > >>>>>>>>>>>>> operators and running on Job Manager. The coordination > >>>>>> mechanism is > >>>>>>>>>>>>> operator events between OperatorCoordinator and its all > >>>>>> operators, the > >>>>>>>>>>>>> coordination is more and more using in Flink, for > >> example > >>>> many > >>>>>> Sources > >>>>>>>>>>> and > >>>>>>>>>>>>> Sinks depend on the mechanism to assign splits and > >>>> coordinate > >>>>>> commits to > >>>>>>>>>>>>> external systems. The OperatorCoordinator is widely > >> using > >>> in > >>>>>> flink kafka > >>>>>>>>>>>>> connector, flink pulsar connector, flink cdc connector, > >>>> flink > >>>>>> hudi > >>>>>>>>>>>>> connector and so on. > >>>>>>>>>>>>> > >>>>>>>>>>>>> But there is not a suitable metric group scope for the > >>>>>>>>>>> OperatorCoordinator > >>>>>>>>>>>>> and not an implementation for the interface > >>>>>>>>>>> OperatorCoordinatorMetricGroup. > >>>>>>>>>>>>> These metrics in OperatorCoordinator could be how many > >>>>>> splits/partitions > >>>>>>>>>>>>> have been assigned to source readers, how many files > >> have > >>>> been > >>>>>> written > >>>>>>>>>>> out > >>>>>>>>>>>>> by sink writers, these metrics not only help users to > >> know > >>>> the > >>>>>> job > >>>>>>>>>>> progress > >>>>>>>>>>>>> but also make big job maintaining easier. Thus we > >> propose > >>>> the > >>>>>> FLIP-274 > >>>>>>>>>>> to > >>>>>>>>>>>>> introduce a new metric group scope for > >> OperatorCoordinator > >>>> and > >>>>>> provide > >>>>>>>>>>> an > >>>>>>>>>>>>> internal implementation for > >>> OperatorCoordinatorMetricGroup. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Could you help review this FLIP when you get time? Any > >>>>> feedback > >>>>>> is > >>>>>>>>>>>>> appreciated! > >>>>>>>>>>>>> > >>>>>>>>>>>>> Best, > >>>>>>>>>>>>> Hang > >>>>>>>>>>>>> > >>>>>>>>>>>>> [1] > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-274%3A+Introduce+metric+group+for+OperatorCoordinator > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > >