Hi, Jark and Dong, Thanks for your comments. Sorry for my late reply.
For suggestion 1, I plan to implement the SplitEnumeratorMetricGroup in another issue, and it is not contained in this FLIP. I will add some description about this part. For suggestion 2, changes about OperatorCoordinator#metricGroup has already been documented in the proposed change section. Best, Hang Dong Lin <lindon...@gmail.com> 于2023年1月1日周日 09:45写道: > Let me chime-in and add comments regarding the public interface section. > Please see my comments inline. > > On Thu, Dec 29, 2022 at 6:08 PM Jark Wu <imj...@gmail.com> wrote: > > > Hi Hang, > > > > Thanks for driving this discussion. I think this is a very useful feature > > for connectors. > > > > The FLIP looks quite good to me, and I just have two suggestions. > > > > 1. In the "Public Interface" section, mention that the implementation > > behavior of "SplitEnumeratorContext#metricGroup" is changed from > returning > > null to returning a concrete SplitEnumeratorMetricGroup instance. Even > > though the API is already there, the behavior change can also be > considered > > a public change. > > > > SplitEnumeratorContext#metricGroup is an interface and this FLIP does not > seem to change its semantics/behavior. The FLIP does change the > implementation/behavior of SourceCoordinatorContext#metricGroup, which is > marked @Internal. > > Thus it might seem a bit weird to add in the public interface section > saying that we change the interface SplitEnumeratorContext#metricGroup from > returning null to non-null object. > > 2. Mention the newly added interface of "OperatorCoordinator#metricGroup" > > in the "Proposed Changes" section or "Public Interface" section. As the > > FLIP said, OperatorCoordinator is widely used in many connectors. Though > it > > is still an @Internal API, I think it is worth mentioning the change in > the > > FLIP. > > > Since OperatorCoordinator is an internal API, it seems reasonable to > explain it in the proposed change section. The FLIP seems to have > documented this in point 5 of the proposed change section. > > BTW, if we think there are @internal classes that are important enough to > be added in the public interface section, it might be useful to explicitly > discuss this topic and document it in the "*What are the "public > interfaces" of the project*" in this > < > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > > > wiki. > > > > Best, > > Jark > > > > > > On Mon, 26 Dec 2022 at 18:06, Hang Ruan <ruanhang1...@gmail.com> wrote: > > > > > Hi, thanks for the feedback, Zhu Zhu and Qingsheng. > > > After combining everyone's comments, the main concerns and > corresponding > > > adjustments are as follows. > > > > > > Q1: Common metrics are not quite useful. > > > numEventsIn and numEventsOut counters will be removed from the > > > OperatorCoordinatorMetricGroup. These common metrics do not provide > > enough > > > information for users. The users are more willing to get the number of > > > events of the specified type instead of the total number. And this > metric > > > is calculated differently. The implementation could register the metric > > by > > > themselves. > > > > > > Q2: This FLIP is overly complicated. > > > This FLIP will become concise after these modifications. > > > OperatorCoordinatorMetricGroup has already been introduced into Flink > by > > > FLIP-179< > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics > > > >. > > > And > > > this FLIP will not change it. This FLIP only provides a new metric > option > > > and a new metric group scope. The changes in proposed changes provide > the > > > details about the modifications for the internal classes, which might > > make > > > it look complicated. > > > > > > Thanks for all the comments again. If there are no further comments, we > > > plan to start the voting thread this week. > > > > > > Best, > > > Hang > > > > > > Qingsheng Ren <renqs...@gmail.com> 于2022年12月26日周一 16:48写道: > > > > > > > Thanks for the FLIP, Hang! > > > > > > > > This FLIP overall looks good to me. Actually I share the same concern > > > with > > > > Zhu that numEventsIn and numEventsOut counters are not quite useful > to > > > end > > > > users. OperatorEvent is a quite low-level abstraction, which requires > > > > instantialization in order to be practical to users and developers, > so > > > > maybe it's better to exclude them from the FLIP. > > > > > > > > Best, > > > > Qingsheng > > > > Ververica (Alibaba) > > > > > > > > On Mon, Dec 26, 2022 at 12:08 PM Zhu Zhu <reed...@gmail.com> wrote: > > > > > > > > > Hi Hang, > > > > > I still see no strong reason why we need numEventsIn/numEventsOut > > > > metrics. > > > > > In the discussion in FLINK-29801, I can see the same concern from > > > others. > > > > > So I prefer to exclude them from this FLIP to avoid over-extending > > the > > > > > scope. > > > > > > > > > > Thanks, > > > > > Zhu > > > > > > > > > > Hang Ruan <ruanhang1...@gmail.com> 于2022年12月23日周五 15:21写道: > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > As for the Zhu Zhu's problem, I think we should keep the common > > > > metrics, > > > > > which will help to observe incoming and outgoing events. What do > you > > > > think, > > > > > @Zhu Zhu ? > > > > > > And @Chesnay, are there any other issues you are more concerned > > > about? > > > > > Looking forward to your reply. > > > > > > > > > > > > Thanks for all the comments. If there are no further comments, we > > > plan > > > > > to start the voting thread next week. > > > > > > > > > > > > Best, > > > > > > Hang > > > > > > > > > > > > Hang Ruan <ruanhang1...@gmail.com> 于2022年12月15日周四 16:49写道: > > > > > >> > > > > > >> Hi, Zhu Zhu, > > > > > >> > > > > > >> Thanks for your feedback! > > > > > >> > > > > > >> The OperatorCoordinator implementations are different. And their > > > > > metrics are much different too. We try to find the common metrics > and > > > put > > > > > them in the OperatorCoordinatorMetricGroup. If most developers > think > > we > > > > do > > > > > not need these common metrics, removing the common metrics is also > > > > > acceptable. > > > > > >> > > > > > >> Best, > > > > > >> Hang > > > > > >> > > > > > >> Zhu Zhu <reed...@gmail.com> 于2022年12月14日周三 22:09写道: > > > > > >>> > > > > > >>> Hi Hang & MengYue, > > > > > >>> > > > > > >>> Thanks for creating this FLIP! > > > > > >>> > > > > > >>> I think it is very useful, mainly in two aspects: > > > > > >>> 1. Enables OperatorCoordinators to register metrics. Currently > > > > > >>> the coordinators has no way to do this. And operator > coordinator > > > > > >>> metric group further enables the SplitEnumerator to have access > > > > > >>> to a registered metric group (via the existing public interface > > > > > >>> SplitEnumeratorContext#metricGroup()), which is null at the > > moment. > > > > > >>> > > > > > >>> 2. Defines the scope of operator coordinator metrics. A clear > > > > > definition > > > > > >>> makes it easy for users to find their wanted metrics. The > > > definition > > > > > >>> also helps to avoid conflicts of metrics from multiple > > > > > OperatorCoordinators > > > > > >>> of the same kind. E.g. each SourceCoordinator may have its own > > > > > >>> numSourceSplits metric, these metrics should not be directly > > > > registered > > > > > >>> to the job metric group. > > > > > >>> > > > > > >>> What I'm a bit concerned is the necessity of the introduced > > common > > > > > metrics > > > > > >>> numEventsInCounter & numEventsOutCounter. If there any case > which > > > > > strongly > > > > > >>> requires them? > > > > > >>> > > > > > >>> Regarding the concerns of Chesnay, > > > > > >>> > A dedicated coordinator MG implementation is overkill > > > > > >>> Directly using the job metric group can result in metric > > conflicts, > > > > as > > > > > mentioned > > > > > >>> in above #2. > > > > > >>> > > > > > >>> Thanks, > > > > > >>> Zhu > > > > > >>> > > > > > >>> Dong Lin <lindon...@gmail.com> 于2022年12月10日周六 14:16写道: > > > > > >>> > > > > > >>> > > > > > > >>> > Hi Chesney, > > > > > >>> > > > > > > >>> > Just to double check with you, OperatorCoordinatorMetricGroup > > > > > (annotated as > > > > > >>> > @PublicEvolving) has already been introduced into Flink by > > > FLIP-179 > > > > > >>> > < > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics > > > > > >. > > > > > >>> > And that FLIP has got you +1.. Do you mean we should remove > > this > > > > > >>> > OperatorCoordinatorMetricGroup? > > > > > >>> > > > > > > >>> > Regards, > > > > > >>> > Dong > > > > > >>> > > > > > > >>> > On Sat, Dec 10, 2022 at 1:33 AM Chesnay Schepler < > > > > ches...@apache.org> > > > > > wrote: > > > > > >>> > > > > > > >>> > > As a whole I feel like this FLIP is overly complicated. A > > > > dedicated > > > > > >>> > > coordinator MG implementation is overkill; it could just > > re-use > > > > the > > > > > >>> > > existing Task/OperatorMGs to create the same structure we > > have > > > on > > > > > TMs, > > > > > >>> > > similar to what we did with the Job MG. > > > > > >>> > > > > > > > >>> > > However, I'm not convinced that this is required anyway, > > > because > > > > > all the > > > > > >>> > > example metrics you listed can be implemented on the TM > side > > + > > > > > >>> > > aggregating them in the external metrics backend. > > > > > >>> > > > > > > > >>> > > Since I'm on holidays soon, just so no one tries to pull a > > fast > > > > > one on > > > > > >>> > > me, if this were to go to a vote as-is I'd be against it. > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > On 09/12/2022 15:30, Dong Lin wrote: > > > > > >>> > > > Hi Hang, > > > > > >>> > > > > > > > > >>> > > > Thanks for the FLIP! The FLIP looks good and it is pretty > > > > > informative. > > > > > >>> > > > > > > > > >>> > > > I have just two minor comments regarding names: > > > > > >>> > > > - Would it be useful to rename the config key as > > > > > >>> > > > *metrics.scope.jm.job.operator-coordinator* for > consistency > > > > with > > > > > >>> > > > *metrics.scope.jm.job > > > > > >>> > > > *(which is not named as *jm-job)? > > > > > >>> > > > - Maybe rename the variable as > > > > SCOPE_NAMING_OPERATOR_COORDINATOR > > > > > for > > > > > >>> > > > simplicity and consistency with SCOPE_NAMING_OPERATOR > > (which > > > is > > > > > not named > > > > > >>> > > > as SCOPE_NAMING_TM_JOB_OPERATOR)? > > > > > >>> > > > > > > > > >>> > > > Cheers, > > > > > >>> > > > Dong > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > On Thu, Dec 8, 2022 at 3:28 PM Hang Ruan < > > > > ruanhang1...@gmail.com> > > > > > wrote: > > > > > >>> > > > > > > > > >>> > > >> Hi all, > > > > > >>> > > >> > > > > > >>> > > >> MengYue and I created FLIP-274[1] Introduce metric group > > for > > > > > >>> > > >> OperatorCoordinator. OperatorCoordinator is the > > coordinator > > > > for > > > > > runtime > > > > > >>> > > >> operators and running on Job Manager. The coordination > > > > > mechanism is > > > > > >>> > > >> operator events between OperatorCoordinator and its all > > > > > operators, the > > > > > >>> > > >> coordination is more and more using in Flink, for > example > > > many > > > > > Sources > > > > > >>> > > and > > > > > >>> > > >> Sinks depend on the mechanism to assign splits and > > > coordinate > > > > > commits to > > > > > >>> > > >> external systems. The OperatorCoordinator is widely > using > > in > > > > > flink kafka > > > > > >>> > > >> connector, flink pulsar connector, flink cdc connector, > > > flink > > > > > hudi > > > > > >>> > > >> connector and so on. > > > > > >>> > > >> > > > > > >>> > > >> But there is not a suitable metric group scope for the > > > > > >>> > > OperatorCoordinator > > > > > >>> > > >> and not an implementation for the interface > > > > > >>> > > OperatorCoordinatorMetricGroup. > > > > > >>> > > >> These metrics in OperatorCoordinator could be how many > > > > > splits/partitions > > > > > >>> > > >> have been assigned to source readers, how many files > have > > > been > > > > > written > > > > > >>> > > out > > > > > >>> > > >> by sink writers, these metrics not only help users to > know > > > the > > > > > job > > > > > >>> > > progress > > > > > >>> > > >> but also make big job maintaining easier. Thus we > propose > > > the > > > > > FLIP-274 > > > > > >>> > > to > > > > > >>> > > >> introduce a new metric group scope for > OperatorCoordinator > > > and > > > > > provide > > > > > >>> > > an > > > > > >>> > > >> internal implementation for > > OperatorCoordinatorMetricGroup. > > > > > >>> > > >> > > > > > >>> > > >> Could you help review this FLIP when you get time? Any > > > > feedback > > > > > is > > > > > >>> > > >> appreciated! > > > > > >>> > > >> > > > > > >>> > > >> Best, > > > > > >>> > > >> Hang > > > > > >>> > > >> > > > > > >>> > > >> [1] > > > > > >>> > > >> > > > > > >>> > > >> > > > > > >>> > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-274%3A+Introduce+metric+group+for+OperatorCoordinator > > > > > >>> > > >> > > > > > >>> > > > > > > > >>> > > > > > > > > > > > > > > > > >