Hi Hang, I still see no strong reason why we need numEventsIn/numEventsOut metrics. In the discussion in FLINK-29801, I can see the same concern from others. So I prefer to exclude them from this FLIP to avoid over-extending the scope.
Thanks, Zhu Hang Ruan <ruanhang1...@gmail.com> 于2022年12月23日周五 15:21写道: > > Hi everyone, > > As for the Zhu Zhu's problem, I think we should keep the common metrics, > which will help to observe incoming and outgoing events. What do you think, > @Zhu Zhu ? > And @Chesnay, are there any other issues you are more concerned about? > Looking forward to your reply. > > Thanks for all the comments. If there are no further comments, we plan to > start the voting thread next week. > > Best, > Hang > > Hang Ruan <ruanhang1...@gmail.com> 于2022年12月15日周四 16:49写道: >> >> Hi, Zhu Zhu, >> >> Thanks for your feedback! >> >> The OperatorCoordinator implementations are different. And their metrics are >> much different too. We try to find the common metrics and put them in the >> OperatorCoordinatorMetricGroup. If most developers think we do not need >> these common metrics, removing the common metrics is also acceptable. >> >> Best, >> Hang >> >> Zhu Zhu <reed...@gmail.com> 于2022年12月14日周三 22:09写道: >>> >>> Hi Hang & MengYue, >>> >>> Thanks for creating this FLIP! >>> >>> I think it is very useful, mainly in two aspects: >>> 1. Enables OperatorCoordinators to register metrics. Currently >>> the coordinators has no way to do this. And operator coordinator >>> metric group further enables the SplitEnumerator to have access >>> to a registered metric group (via the existing public interface >>> SplitEnumeratorContext#metricGroup()), which is null at the moment. >>> >>> 2. Defines the scope of operator coordinator metrics. A clear definition >>> makes it easy for users to find their wanted metrics. The definition >>> also helps to avoid conflicts of metrics from multiple OperatorCoordinators >>> of the same kind. E.g. each SourceCoordinator may have its own >>> numSourceSplits metric, these metrics should not be directly registered >>> to the job metric group. >>> >>> What I'm a bit concerned is the necessity of the introduced common metrics >>> numEventsInCounter & numEventsOutCounter. If there any case which strongly >>> requires them? >>> >>> Regarding the concerns of Chesnay, >>> > A dedicated coordinator MG implementation is overkill >>> Directly using the job metric group can result in metric conflicts, as >>> mentioned >>> in above #2. >>> >>> Thanks, >>> Zhu >>> >>> Dong Lin <lindon...@gmail.com> 于2022年12月10日周六 14:16写道: >>> >>> > >>> > Hi Chesney, >>> > >>> > Just to double check with you, OperatorCoordinatorMetricGroup (annotated >>> > as >>> > @PublicEvolving) has already been introduced into Flink by FLIP-179 >>> > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics>. >>> > And that FLIP has got you +1.. Do you mean we should remove this >>> > OperatorCoordinatorMetricGroup? >>> > >>> > Regards, >>> > Dong >>> > >>> > On Sat, Dec 10, 2022 at 1:33 AM Chesnay Schepler <ches...@apache.org> >>> > wrote: >>> > >>> > > As a whole I feel like this FLIP is overly complicated. A dedicated >>> > > coordinator MG implementation is overkill; it could just re-use the >>> > > existing Task/OperatorMGs to create the same structure we have on TMs, >>> > > similar to what we did with the Job MG. >>> > > >>> > > However, I'm not convinced that this is required anyway, because all the >>> > > example metrics you listed can be implemented on the TM side + >>> > > aggregating them in the external metrics backend. >>> > > >>> > > Since I'm on holidays soon, just so no one tries to pull a fast one on >>> > > me, if this were to go to a vote as-is I'd be against it. >>> > > >>> > > >>> > > On 09/12/2022 15:30, Dong Lin wrote: >>> > > > Hi Hang, >>> > > > >>> > > > Thanks for the FLIP! The FLIP looks good and it is pretty informative. >>> > > > >>> > > > I have just two minor comments regarding names: >>> > > > - Would it be useful to rename the config key as >>> > > > *metrics.scope.jm.job.operator-coordinator* for consistency with >>> > > > *metrics.scope.jm.job >>> > > > *(which is not named as *jm-job)? >>> > > > - Maybe rename the variable as SCOPE_NAMING_OPERATOR_COORDINATOR for >>> > > > simplicity and consistency with SCOPE_NAMING_OPERATOR (which is not >>> > > > named >>> > > > as SCOPE_NAMING_TM_JOB_OPERATOR)? >>> > > > >>> > > > Cheers, >>> > > > Dong >>> > > > >>> > > > >>> > > > >>> > > > On Thu, Dec 8, 2022 at 3:28 PM Hang Ruan <ruanhang1...@gmail.com> >>> > > > wrote: >>> > > > >>> > > >> Hi all, >>> > > >> >>> > > >> MengYue and I created FLIP-274[1] Introduce metric group for >>> > > >> OperatorCoordinator. OperatorCoordinator is the coordinator for >>> > > >> runtime >>> > > >> operators and running on Job Manager. The coordination mechanism is >>> > > >> operator events between OperatorCoordinator and its all operators, >>> > > >> the >>> > > >> coordination is more and more using in Flink, for example many >>> > > >> Sources >>> > > and >>> > > >> Sinks depend on the mechanism to assign splits and coordinate >>> > > >> commits to >>> > > >> external systems. The OperatorCoordinator is widely using in flink >>> > > >> kafka >>> > > >> connector, flink pulsar connector, flink cdc connector, flink hudi >>> > > >> connector and so on. >>> > > >> >>> > > >> But there is not a suitable metric group scope for the >>> > > OperatorCoordinator >>> > > >> and not an implementation for the interface >>> > > OperatorCoordinatorMetricGroup. >>> > > >> These metrics in OperatorCoordinator could be how many >>> > > >> splits/partitions >>> > > >> have been assigned to source readers, how many files have been >>> > > >> written >>> > > out >>> > > >> by sink writers, these metrics not only help users to know the job >>> > > progress >>> > > >> but also make big job maintaining easier. Thus we propose the >>> > > >> FLIP-274 >>> > > to >>> > > >> introduce a new metric group scope for OperatorCoordinator and >>> > > >> provide >>> > > an >>> > > >> internal implementation for OperatorCoordinatorMetricGroup. >>> > > >> >>> > > >> Could you help review this FLIP when you get time? Any feedback is >>> > > >> appreciated! >>> > > >> >>> > > >> Best, >>> > > >> Hang >>> > > >> >>> > > >> [1] >>> > > >> >>> > > >> >>> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-274%3A+Introduce+metric+group+for+OperatorCoordinator >>> > > >> >>> > > >>> > >