Hi everyone, All issues about FLIP-274 have been resolved and consensus has been reached. If there are no further problems, we plan to start the voting thread.
Thanks a lot for participating in the vote discussion. Special thanks to @Chesnay Schepler <[email protected]> and @Jark Wu <[email protected]>, their suggestions and discussion make this FLIP clearer and more reasonable. Best, Hang Chesnay Schepler <[email protected]> 于2023年2月2日周四 18:44写道: > Looks good from my side. > > On 01/02/2023 13:27, Hang Ruan wrote: > > Hi all, > > > > I make a mistake about the default value of the scope format > > JobManagerOperatorScopeFormat. It should be > > <host>.jobmanager.<job_name>.<operator_name>. > > I have fixed it in the FLIP. > > And the variables (task id and task name) have been added to the list. > > > > Best, > > Hang > > > > > > Hang Ruan <[email protected]> 于2023年2月1日周三 14:11写道: > > > >> Hi all, > >> > >>> IIUC, there is only an OperatorCoordinatorMetricGroup interface without > >> implementations (it never worked). There were no coordinator metrics > >> implemented and registered. @Hang Ruan please correct me if I'm wrong. > >> Yes, we shouldn't have to worry about backwards-compatibility as > Chesnay > >> said. > >> > >> After the discussion, I think the FLIP should be modified as follow. > >> 1. Add new metric group JobManagerOperatorMetricGroup, whose scope > >> format is JobManagerOperatorScopeFormat. The config key of > JobManagerOperatorScopeFormat > >> is metrics.scope.jm-operator, and the default value is > >> <host>.jobmanager.<job_name>.<task_name>.<operator_name>. > >> 2. JobManagerOperatorMetricGroup will be sub components of > >> JobManagerJobMetricGroup. > >> 3. InternalOperatorCoordinatorMetricGroup will extend ProxyMetricGroup, > >> which registered under JobManagerOperatorMetricGroup by > >> addGroup("coordinator"). > >> > >> I have updated the FLIP-274. @Chesnay Schepler <[email protected]> > @Jark > >> Wu <[email protected]> , please help to take a look at the modified FLIP > >> and make sure that I do not miss something. > >> > >> If there are no more problems, we plan to start the voting thread later. > >> > >> Best, > >> Hang > >> > >> Chesnay Schepler <[email protected]> 于2023年1月31日周二 20:09写道: > >> > >>>> IIUC, there is only an OperatorCoordinatorMetricGroup interface > without > >>> implementations (it never worked). There were no coordinator metrics > >>> implemented and registered. @Hang Ruan <[email protected]> please > >>> correct me if I'm wrong. > >>> > >>> Indeed. There are interfaces, and theoretically a metric group is > exposed > >>> to user-code via the SplitEnumeratorContext, but the metric group is > always > >>> null in the SourceCoordinatorContext. So we shouldn't have to worry > about > >>> backwards-compatibility. > >>> > >>>> However, the config key "metrics.scope.jm.job" is deprecated and is > >>> replaced by "metrics.scope.jm-job". Similarly, "metrics.scope.tm.job" > is > >>> replaced with "metrics.scope.tm-job". That's why I though > >>> "metrics.scope.jm-operator" might be more consistent. > >>> > >>> HA! Good catch. I only looked at the 1.16 docs where they weren't > >>> deprecated yet. > >>> > >>> Then it should be "jm-operator" indeed. > >>> > >>> On 31/01/2023 10:51, Jark Wu wrote: > >>> > >>> Hi Chesnay, > >>> > >>>> The new option must default to the configured value of > >>> metrics.scope.jm.job because otherwise it isn't backwards-compatible > with > >>> existing coordinator metrics. > >>> IIUC, there is only an OperatorCoordinatorMetricGroup interface without > >>> implementations (it never worked). There were no coordinator metrics > >>> implemented and registered. @Hang Ruan <[email protected]> please > >>> correct me if I'm wrong. > >>> > >>>> Intuitively I think the config key should be > >>> metrics.scope.jm.operator., similar to metrics.scope.jm.job. > >>> However, the config key "metrics.scope.jm.job" is deprecated and is > >>> replaced by "metrics.scope.jm-job". Similarly, "metrics.scope.tm.job" > is > >>> replaced with "metrics.scope.tm-job". That's why I though > >>> "metrics.scope.jm-operator" might be more consistent. > >>> > >>>> We would only support operator_name, operator_id, task_name and > task_id > >>> (+ everything else from the jm/job groups of course). > >>> +1 > >>> > >>> Best, > >>> Jark > >>> > >>> > >>> > >>> On Mon, 30 Jan 2023 at 20:09, Chesnay Schepler <[email protected]> > >>> wrote: > >>> > >>>> The new option must default to the configured value of > >>>> metrics.scope.jm.job because otherwise it isn't backwards-compatible > with > >>>> existing coordinator metrics. > >>>> According to some earlier response in this thread there are already > >>>> coordinator metrics. > >>>> > >>>> If that weren't the case you'd be correct, and we'd keep the default > >>>> consistent with the default for tm operator metrics. > >>>> > >>>> Intuitively I think the config key should be > metrics.scope.jm.operator., > >>>> similar to metrics.scope.jm.job. > >>>> For symmetry we should also add new metrics.scope.tm.operator/task and > >>>> eventually phase out metrics.scope.operator. > >>>> > >>>>> Regarding "coordinator metrics don't *belong to* any tasks/vertexes", > >>>> I mean the coordinator doesn't run on specific tasks, so it doesn't > have > >>>> runtime information of tasks. > >>>> > >>>> That is true. I intentionally only referred to task id/name, as things > >>>> like subtask index or attempt ids don't make sense for coordinators > as you > >>>> said. > >>>> We would only support operator_name, operator_id, task_name and > task_id > >>>> (+ everything else from the jm/job groups of course). > >>>> (Maybe in the future attempt_num could be interesting though) > >>>> > >>>> On 30/01/2023 10:39, Jark Wu wrote: > >>>> > >>>> Hi Chesnay, > >>>> > >>>> IIUC, our only disagreement is whether to support task info in the > >>>> metric format. But we all agree with the default metric format > >>>> > "metrics.scope.jm-operator=<host>.jobmanager.<job_name>.<operator_name>" > >>>> which doesn't include task info, right? > >>>> > >>>> Regarding "coordinator metrics don't *belong to* any tasks/vertexes", > I > >>>> mean the coordinator doesn't run on specific tasks, so it doesn't have > >>>> runtime information of tasks. My concern about adding task info to the > >>>> coordinator metric is that > >>>> 1) if we register coordinator metrics for every subtask, the number of > >>>> coordinator metrics will be exploded when task parallelism is large. > This > >>>> is an overhead to JM & metric systems, and the duplicated metrics are > >>>> useless for users. > >>>> 2) Correct me if I'm wrong, MetricGroup#counter only supports > >>>> registering one metric. It seems it can't register a metric reporting > for > >>>> multiple metric names (e.g. multiple subtasks). > >>>> > >>>> Best, > >>>> Jark > >>>> > >>>> > >>>> On Mon, 30 Jan 2023 at 16:11, Chesnay Schepler <[email protected]> > >>>> wrote: > >>>> > >>>>> I hadn't looked in detail in to how the task info can be introduced, > >>>>> but > >>>>> given that the coordinates are created by the execution graph, where > we > >>>>> only work on tasks, it should be possible and rather simple. > >>>>> > >>>>> The OperatorCoordinatorHolder gets access to the ExecutionJobVertex > >>>>> from > >>>>> which it can extract everything and create the task/operator groups. > >>>>> Exposure to the coordinator could happen via the coordinator context > as > >>>>> originally planned. > >>>>> > >>>>> > As we know, coordinator metrics are just like JMJobMetricGroup, > >>>>> which > >>>>> don't belong to any tasks/vertexes. > >>>>> > >>>>> I don't think this is true, every coordinator is scoped to a specific > >>>>> operator, and every operator is associated with a particular vertex. > >>>>> > >>>>> From the javadocs: > >>>>> > >>>>> " A coordinator for runtime operators. The OperatorCoordinator runs > on > >>>>> the master, associated with > >>>>> the job vertex of the operator. It communicates with operators via > >>>>> sending operator events." > >>>>> > >>>>> On 28/01/2023 09:15, Jark Wu wrote: > >>>>>> Hi all, > >>>>>> > >>>>>> IIUC, Chesnay means we should have a more general metric group for > >>>>>> **operator** > >>>>>> not for **coordinator** in JM, this would be useful to extend > >>>>>> other operator-specific > >>>>>> metrics in the future. That means the new scope format should be > >>>>> designed > >>>>>> for > >>>>>> the operator, > >>>>>> e.g., > >>>>> > metrics.scope.jm-operator=<host>.jobmanager.<job_name>.<operator_name> > >>>>>> The coordinator metric is a subgroup (a constant "coordinator" > >>>>> suffix) of > >>>>>> the JMOperatorMG. > >>>>>> > >>>>>> I think this is a nice design. However, I have a question about > >>>>> adding task > >>>>>> id/name to this list. > >>>>>> How to get the task id/name when reporting a coordinator metric? As > we > >>>>>> know, coordinator metrics > >>>>>> are just like JMJobMetricGroup, which don't belong to any > >>>>> tasks/vertexes. > >>>>>> Do you mean reporting > >>>>>> coordinator metrics for every task id under the operator? > >>>>>> > >>>>>> Best, > >>>>>> Jark > >>>>>> > >>>>>> > >>>>>> On Thu, 19 Jan 2023 at 17:33, Chesnay Schepler <[email protected]> > >>>>> wrote: > >>>>>>> > First, I do not understand why users have to configure the new > >>>>> scope > >>>>>>> format, which has a default value. > >>>>>>> > >>>>>>> If you don't use scope formats, sure. If you do use scope formats, > >>>>> e.g. > >>>>>>> to add a common prefix (which is the case for datadog users for > >>>>>>> example), then the current default in the FLIP is insufficient and > >>>>>>> requires the user to update the configuration. > >>>>>>> > >>>>>>> > I usually do not need to change these configuration of scope > >>>>> formats > >>>>>>> when submitting the flink job. > >>>>>>> > >>>>>>> Scope formats as a whole are quite a power-user feature, but that > >>>>>>> doesn't mean we should ignore it. > >>>>>>> > >>>>>>> > I try to let it extend the ComponentMetricGroup > >>>>>>> > >>>>>>> This isn't inherently required just because it is a "component". > >>>>>>> Component metric groups should _only_ be used for cases where such > a > >>>>>>> component provides several bits of metadata. > >>>>>>> For example, tasks provide vertex ids, task names, attempted IDs > >>>>> etc., > >>>>>>> and we'd like users to have the option on how this metadata ends up > >>>>>>> being used (via scope formats). This currently can't be built with > >>>>> the > >>>>>>> addGroup() methods, hence why the component groups exist. > >>>>>>> However, the operator coordinator _itself_does not provide several > >>>>> bits > >>>>>>> of metadata. Logically, the _operator_ does, not the coordinator. > >>>>>>> > >>>>>>> > This make me decide to add a new scope format. > >>>>>>> > >>>>>>> And this is fine; just don't add a new scope format for the > >>>>> coordinator > >>>>>>> but logically for operators on the JM side, such that we can extend > >>>>> the > >>>>>>> set of operator-specific metrics in the future without having to > add > >>>>> yet > >>>>>>> another scope format. > >>>>>>> > >>>>>>> > The relationship between the Operator and OperatorCoordinator > is > >>>>>>> maintained in the OperatorCoordinatorHolder. In the POC, the > operator > >>>>>>> name/id could be found and the OperatorCoordinatorMetricGroup will > be > >>>>>>> created here. > >>>>>>> > >>>>>>> That's all irrelevant. > >>>>>>> > >>>>>>> > The registered metrics of OperatorCoordinator always are those > >>>>> which > >>>>>>> can not be aggregated from the tasks, like the number of unassigned > >>>>>>> splits in the SourceCoordinator. Actually we have not encountered > the > >>>>>>> scenario you mentioned. I think the OperatorCoordinatorMetricGroup > >>>>>>> should only contain the metrics for itself instead of its subtasks. > >>>>>>> > >>>>>>> You are misunderstanding the problem. > >>>>>>> > >>>>>>> This isn't about aggregating metrics on our side or exposing > metrics > >>>>>>> from TMs on the JM side or anything like that. > >>>>>>> > >>>>>>> It's purely about the available metadata for operator metrics on > the > >>>>> JM > >>>>>>> side. Currently you suggest operator name / id; what I'm proposing > >>>>> is to > >>>>>>> add task(==vertex!) id / name to this list. > >>>>>>> > >>>>>>> The benefit here is simple. I can look up all metrics where > >>>>> task_id==XXX > >>>>>>> and get _everything_ related to that vertex, including all metrics > >>>>>>> associated with operators that are part of that vertex, including > the > >>>>>>> coordinator. > >>>>>>> > >>>>>>> > Using metrics.scope.jm.job is not enough to distinguish the > >>>>> different > >>>>>>> coordinators. > >>>>>>> > >>>>>>> I did not suggest just using metrics.scope.jm.job. I suggested that > >>>>> as > >>>>>>> the default for the jm operator scope format, that is used for the > >>>>>>> proposed JobManagerOperatorMetricGroup, which will have some plain > >>>>>>> metric groups as children for coordinators. > >>>>>>> Aka, you have the JMOperatorMG, then you call > >>>>> addGroup("coordinators") > >>>>>>> and then addGroup(coordinator_name) for each coordinator. > >>>>>>> > >>>>>>> > So there are two choice for the > >>>>>>> InternalOperatorCoordinatorMetricGroup: 1. add and improve the new > >>>>> scope > >>>>>>> format; 2. use the metrics.scope.jm.job and ProxyMetricGroup. Which > >>>>> one > >>>>>>> is better? > >>>>>>> > >>>>>>> There are more options than that. > >>>>>>> I feel like there are a lot of misunderstandings in this > discussion. > >>>>>>> Please let me know if I could clear things up. If not I can also > >>>>> provide > >>>>>>> a PoC based on your PoC if that can speed things up. It's not that > >>>>>>> different anyway. > >>>>>>> > >>>>>>> On 19/01/2023 07:39, Hang Ruan wrote: > >>>>>>>> Hi, chesnay, > >>>>>>>> > >>>>>>>> Thanks for your reply. I still have some doubts about the > questions > >>>>>>>> you raised. > >>>>>>>> > >>>>>>>> > Extending the set of ScopeFormats is problematic because > it > >>>>> in > >>>>>>>> practice > >>>>>>>> it breaks the config if users actively rely on it, since > >>>>> there's now > >>>>>>>> another key that they _must_ set for it to be > >>>>>>>> consistent/compatible with > >>>>>>>> their existing setup. > >>>>>>>> Unfortunately due to how powerful scope formats are we can't > >>>>> derive a > >>>>>>>> default value that matches their existing setup. > >>>>>>>> Hence we should try to do this as rarely as possible. > >>>>>>>> > >>>>>>>> > >>>>>>>> > This FLIP does not adhere to that since it proposes a > >>>>> dedicated > >>>>>>>> format > >>>>>>>> for coordinators; next time we want to expose > operator-specific > >>>>>>>> metrics > >>>>>>>> (e.g., in the scheduler) we'd have to add another one to > >>>>> support it. > >>>>>>>> > >>>>>>>> First, I do not understand why users have to configure the new > scope > >>>>>>>> format, which has a default value. I usually do not need to change > >>>>>>>> these configuration of scope formats when submitting the flink > job. > >>>>>>>> IMO, the OperatorCoordinator is a component running at the > >>>>> JobMaster. > >>>>>>>> I try to let it extend the ComponentMetricGroup, but I cannot find > >>>>>>>> a suitable scope. This make me decide to add a new scope format. > >>>>>>>> > >>>>>>>> > Additionally, the configurable variables (operator > name/id) > >>>>> are > >>>>>>>> logically not attached to the coordinator, but operators, so > >>>>> to me it > >>>>>>>> just doesn't make sense to structure it like this. > >>>>>>>> > >>>>>>>> > >>>>>>>> The relationship between the Operator and OperatorCoordinator is > >>>>>>>> maintained in the OperatorCoordinatorHolder. In the POC, the > >>>>> operator > >>>>>>>> name/id could be found and the OperatorCoordinatorMetricGroup will > >>>>> be > >>>>>>>> created here. > >>>>>>>> > >>>>>>>> > Another thing I'm concerned about is that, because we > don't > >>>>> include > >>>>>>>> tasks in the hierarchy, users wishing to collect all metrics > >>>>> for a > >>>>>>>> particular task (in this case ==vertex) now have to go > >>>>>>>> significantly out > >>>>>>>> of their way to get them, since they can no longer just > filter > >>>>> by the > >>>>>>>> task ID but have to be filter for _all_ operators that are > >>>>> part of > >>>>>>>> the task. > >>>>>>>> > >>>>>>>> > >>>>>>>> The registered metrics of OperatorCoordinator always are those > which > >>>>>>>> can not be aggregated from the tasks, like the number of > unassigned > >>>>>>>> splits in the SourceCoordinator. Actually we have not encountered > >>>>> the > >>>>>>>> scenario you mentioned. I think the OperatorCoordinatorMetricGroup > >>>>>>>> should only contain the metrics for itself instead of its > subtasks. > >>>>>>>> > >>>>>>>> Using metrics.scope.jm.job is not enough to distinguish the > >>>>> different > >>>>>>>> coordinators. The operator id/name is necessary. Then > >>>>>>>> the implementation should be like this. But users can not change > the > >>>>>>>> scope in this way. This is also acceptable. > >>>>>>>> @Internal public class InternalOperatorCoordinatorMetricGroup > >>>>>>>> extends ProxyMetricGroup<MetricGroup> > >>>>>>>> implements OperatorCoordinatorMetricGroup { > >>>>>>>> public InternalOperatorCoordinatorMetricGroup( > >>>>>>>> JobManagerJobMetricGroup parent, OperatorID > >>>>> operatorID, > >>>>>>> String operatorName) { > >>>>>>>> super(parent.addGroup(operatorID + > >>>>>>> operatorName).addGroup("coordinator")); } > >>>>>>>> } > >>>>>>>> So there are two choice for the > >>>>>>>> InternalOperatorCoordinatorMetricGroup: 1. add and improve the new > >>>>>>>> scope format; 2. use the metrics.scope.jm.job and > ProxyMetricGroup. > >>>>>>>> Which one is better? > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Hang > >>>>>>>> > >>>>>>>> > >>>>>>>> Chesnay Schepler <[email protected]> 于2023年1月18日周三 17:03写道: > >>>>>>>> > >>>>>>>> You're misunderstanding the problem. > >>>>>>>> > >>>>>>>> Metric groups form a tree with each group providing certain > >>>>> metadata. > >>>>>>>> E.g., on the taskmanager we have a TM metric group that > >>>>> provides info > >>>>>>>> about the TM, that has child task metric groups, that have > >>>>> child > >>>>>>>> operator metric groups etc. The operator metric group again > has > >>>>>>>> children, where we are now mostly in the user/connector > land, > >>>>> like a > >>>>>>>> kafka metric group providing partition info or something. > >>>>>>>> > >>>>>>>> What is being proposed is to create a coordinator metric > group > >>>>> on > >>>>>>>> the JM > >>>>>>>> side that provides operator metadata. > >>>>>>>> > >>>>>>>> A more appropriate structure is to create an operator group > on > >>>>> the JM > >>>>>>>> side that provides this info, with the coordinator metric > group > >>>>>>>> being a > >>>>>>>> child. > >>>>>>>> > >>>>>>>> On 17/01/2023 19:56, Steven Wu wrote: > >>>>>>>> >> Additionally, the configurable variables (operator > name/id) > >>>>> are > >>>>>>>> > logically not attached to the coordinator, but operators, > so > >>>>> to > >>>>>>>> me it > >>>>>>>> > just doesn't make sense to structure it like this. > >>>>>>>> > > >>>>>>>> > Chesnay, maybe we should clarify the terminology. To me, > >>>>>>>> pperators (like > >>>>>>>> > FLIP-27 source) can have two parts (coordinator and > >>>>>>>> reader/subtask). I > >>>>>>>> > think it is fine to include operator name/id for > coordinator > >>>>>>>> metrics. > >>>>>>>> > > >>>>>>>> > On Mon, Jan 16, 2023 at 2:13 AM Chesnay Schepler > >>>>>>>> <[email protected]> wrote: > >>>>>>>> > > >>>>>>>> >> Slight correction: Using metrics.scope.jm.job as the > default > >>>>>>>> should be > >>>>>>>> >> safe. > >>>>>>>> >> > >>>>>>>> >> On 16/01/2023 10:18, Chesnay Schepler wrote: > >>>>>>>> >>> The proposed ScopeFormat is still problematic for a few > >>>>> reasons. > >>>>>>>> >>> > >>>>>>>> >>> Extending the set of ScopeFormats is problematic because > >>>>> it in > >>>>>>>> >>> practice it breaks the config if users actively rely on > >>>>> it, since > >>>>>>>> >>> there's now another key that they _must_ set for it to > be > >>>>>>>> >>> consistent/compatible with their existing setup. > >>>>>>>> >>> Unfortunately due to how powerful scope formats are we > >>>>> can't > >>>>>>>> derive a > >>>>>>>> >>> default value that matches their existing setup. > >>>>>>>> >>> Hence we should try to do this as rarely as possible. > >>>>>>>> >>> > >>>>>>>> >>> This FLIP does not adhere to that since it proposes a > >>>>>>>> dedicated format > >>>>>>>> >>> for coordinators; next time we want to expose > >>>>> operator-specific > >>>>>>>> >>> metrics (e.g., in the scheduler) we'd have to add > another > >>>>> one to > >>>>>>>> >>> support it. > >>>>>>>> >>> > >>>>>>>> >>> Additionally, the configurable variables (operator > >>>>> name/id) are > >>>>>>>> >>> logically not attached to the coordinator, but > operators, > >>>>> so > >>>>>>>> to me it > >>>>>>>> >>> just doesn't make sense to structure it like this. > >>>>>>>> >>> > >>>>>>>> >>> Another thing I'm concerned about is that, because we > don't > >>>>>>>> include > >>>>>>>> >>> tasks in the hierarchy, users wishing to collect all > >>>>> metrics for > >>>>>>> a > >>>>>>>> >>> particular task (in this case ==vertex) now have to go > >>>>>>>> significantly > >>>>>>>> >>> out of their way to get them, since they can no longer > just > >>>>>>>> filter by > >>>>>>>> >>> the task ID but have to be filter for _all_ operators > that > >>>>> are > >>>>>>>> part of > >>>>>>>> >>> the task. > >>>>>>>> >>> > >>>>>>>> >>> On 16/01/2023 03:09, Hang Ruan wrote: > >>>>>>>> >>>> Hi, @[email protected] <[email protected]> , > >>>>>>>> >>>> > >>>>>>>> >>>> Do you have time to help to review this FLIP again > after > >>>>> the > >>>>>>>> >>>> modification? > >>>>>>>> >>>> Looking forward to your reply. > >>>>>>>> >>>> This FLIP will add a new configuration for the > >>>>>>>> >>>> OperatorCoordinatorMetricGroup scope format. It > provides > >>>>> an > >>>>>>>> internal > >>>>>>>> >>>> implementation and is added as a component to the > >>>>>>>> >>>> JobManagerJobMetricGroup. > >>>>>>>> >>>> If something doesn't make sense, could you provide some > >>>>>>>> advice? It > >>>>>>>> >>>> will be > >>>>>>>> >>>> very helpful. Thanks a lot for your help. > >>>>>>>> >>>> > >>>>>>>> >>>> Best, > >>>>>>>> >>>> Hang > >>>>>>>> >>>> > >>>>>>>> >>>> Martijn Visser <[email protected]> > 于2023年1月11日周三 > >>>>>>>> 16:34写道: > >>>>>>>> >>>> > >>>>>>>> >>>> > >>>>>>>> >>>>> Hi Hang, > >>>>>>>> >>>>> > >>>>>>>> >>>>> I'm a bit surprised that this has gone to a vote, > given > >>>>> that > >>>>>>>> Chesnay > >>>>>>>> >>>>> deliberately mentioned that he would vote against it > >>>>> as-is. > >>>>>>>> I would > >>>>>>>> >>>>> expect > >>>>>>>> >>>>> that before going to a vote, he has had the > opportunity > >>>>> to > >>>>>>>> >>>>> participate in > >>>>>>>> >>>>> this discussion. > >>>>>>>> >>>>> > >>>>>>>> >>>>> Best regards, > >>>>>>>> >>>>> > >>>>>>>> >>>>> Martijn > >>>>>>>> >>>>> > >>>>>>>> >>>>> On Tue, Jan 3, 2023 at 12:53 PM Jark Wu < > >>>>> [email protected]> > >>>>>>>> wrote: > >>>>>>>> >>>>> > >>>>>>>> >>>>>> Hi Dong, > >>>>>>>> >>>>>> > >>>>>>>> >>>>>> Regarding “SplitEnumeratorContext#metricGroup���, my > only > >>>>>>>> concern is > >>>>>>>> >>>>>> that > >>>>>>>> >>>>>> this is a core interface for the FLIP. It’s hard to > >>>>> tell how > >>>>>>>> >>>>>> sources use > >>>>>>>> >>>>>> metric group without mentioning this interface. Even > if > >>>>>>>> this is an > >>>>>>>> >>>>> existing > >>>>>>>> >>>>>> API, I think it’s worth introducing the interface > again > >>>>> and > >>>>>>>> declaring > >>>>>>>> >>>>> that > >>>>>>>> >>>>>> we will implement the interface instead of a no-op > >>>>> method > >>>>>>>> in this > >>>>>>>> >>>>>> FLIP. > >>>>>>>> >>>>>> > >>>>>>>> >>>>>> Anyway, this is a minor problem and shouldn’t block > this > >>>>>>>> FLIP. I’m > >>>>>>>> >>>>>> +1 to > >>>>>>>> >>>>>> start a vote. > >>>>>>>> >>>>>> > >>>>>>>> >>>>>> Best, > >>>>>>>> >>>>>> Jark > >>>>>>>> >>>>>> > >>>>>>>> >>>>>> > >>>>>>>> >>>>>>> 2023年1月3日 10:03,Hang Ruan <[email protected]> > 写道: > >>>>>>>> >>>>>>> > >>>>>>>> >>>>>>> Hi, Jark and Dong, > >>>>>>>> >>>>>>> > >>>>>>>> >>>>>>> Thanks for your comments. Sorry for my late reply. > >>>>>>>> >>>>>>> > >>>>>>>> >>>>>>> For suggestion 1, I plan to implement the > >>>>>>>> >>>>>>> SplitEnumeratorMetricGroup in > >>>>>>>> >>>>>>> another issue, and it is not contained in this > FLIP. I > >>>>>>>> will add some > >>>>>>>> >>>>>>> description about this part. > >>>>>>>> >>>>>>> For suggestion 2, changes about > >>>>>>>> OperatorCoordinator#metricGroup has > >>>>>>>> >>>>>> already > >>>>>>>> >>>>>>> been documented in the proposed change section. > >>>>>>>> >>>>>>> > >>>>>>>> >>>>>>> Best, > >>>>>>>> >>>>>>> Hang > >>>>>>>> >>>>>>> > >>>>>>>> >>>>>>> Dong Lin <[email protected]> 于2023年1月1日周日 > 09:45写道: > >>>>>>>> >>>>>>> > >>>>>>>> >>>>>>>> Let me chime-in and add comments regarding the > public > >>>>>>>> interface > >>>>>>>> >>>>> section. > >>>>>>>> >>>>>>>> Please see my comments inline. > >>>>>>>> >>>>>>>> > >>>>>>>> >>>>>>>> On Thu, Dec 29, 2022 at 6:08 PM Jark Wu > >>>>>>>> <[email protected]> wrote: > >>>>>>>> >>>>>>>> > >>>>>>>> >>>>>>>>> Hi Hang, > >>>>>>>> >>>>>>>>> > >>>>>>>> >>>>>>>>> Thanks for driving this discussion. I think this > is a > >>>>>>>> very useful > >>>>>>>> >>>>>> feature > >>>>>>>> >>>>>>>>> for connectors. > >>>>>>>> >>>>>>>>> > >>>>>>>> >>>>>>>>> The FLIP looks quite good to me, and I just have > two > >>>>>>>> suggestions. > >>>>>>>> >>>>>>>>> > >>>>>>>> >>>>>>>>> 1. In the "Public Interface" section, mention that > >>>>> the > >>>>>>>> >>>>>>>>> implementation > >>>>>>>> >>>>>>>>> behavior of "SplitEnumeratorContext#metricGroup" > is > >>>>>>>> changed from > >>>>>>>> >>>>>>>> returning > >>>>>>>> >>>>>>>>> null to returning a concrete > >>>>> SplitEnumeratorMetricGroup > >>>>>>>> instance. > >>>>>>>> >>>>> Even > >>>>>>>> >>>>>>>>> though the API is already there, the behavior > change > >>>>> can > >>>>>>>> also be > >>>>>>>> >>>>>>>> considered > >>>>>>>> >>>>>>>>> a public change. > >>>>>>>> >>>>>>>>> > >>>>>>>> >>>>>>>> SplitEnumeratorContext#metricGroup is an interface > and > >>>>>>>> this FLIP > >>>>>>>> >>>>>>>> does > >>>>>>>> >>>>>> not > >>>>>>>> >>>>>>>> seem to change its semantics/behavior. The FLIP > does > >>>>>>>> change the > >>>>>>>> >>>>>>>> implementation/behavior of > >>>>>>>> SourceCoordinatorContext#metricGroup, > >>>>>>>> >>>>>>>> which > >>>>>>>> >>>>>> is > >>>>>>>> >>>>>>>> marked @Internal. > >>>>>>>> >>>>>>>> > >>>>>>>> >>>>>>>> Thus it might seem a bit weird to add in the public > >>>>>>> interface > >>>>>>>> >>>>>>>> section > >>>>>>>> >>>>>>>> saying that we change the interface > >>>>>>>> >>>>>>>> SplitEnumeratorContext#metricGroup > >>>>>>>> >>>>>> from > >>>>>>>> >>>>>>>> returning null to non-null object. > >>>>>>>> >>>>>>>> > >>>>>>>> >>>>>>>> 2. Mention the newly added interface of > >>>>>>>> >>>>>> "OperatorCoordinator#metricGroup" > >>>>>>>> >>>>>>>>> in the "Proposed Changes" section or "Public > >>>>> Interface" > >>>>>>>> section. As > >>>>>>>> >>>>> the > >>>>>>>> >>>>>>>>> FLIP said, OperatorCoordinator is widely used in > many > >>>>>>>> connectors. > >>>>>>>> >>>>>> Though > >>>>>>>> >>>>>>>> it > >>>>>>>> >>>>>>>>> is still an @Internal API, I think it is worth > >>>>>>>> mentioning the > >>>>>>>> >>>>>>>>> change > >>>>>>>> >>>>> in > >>>>>>>> >>>>>>>> the > >>>>>>>> >>>>>>>>> FLIP. > >>>>>>>> >>>>>>>> Since OperatorCoordinator is an internal API, it > seems > >>>>>>>> reasonable to > >>>>>>>> >>>>>>>> explain it in the proposed change section. The FLIP > >>>>> seems > >>>>>>>> to have > >>>>>>>> >>>>>>>> documented this in point 5 of the proposed change > >>>>> section. > >>>>>>>> >>>>>>>> > >>>>>>>> >>>>>>>> BTW, if we think there are @internal classes that > are > >>>>>>>> important > >>>>>>>> >>>>>>>> enough > >>>>>>>> >>>>>> to > >>>>>>>> >>>>>>>> be added in the public interface section, it might > be > >>>>>>>> useful to > >>>>>>>> >>>>>> explicitly > >>>>>>>> >>>>>>>> discuss this topic and document it in the "*What > are > >>>>> the > >>>>>>>> "public > >>>>>>>> >>>>>>>> interfaces" of the project*" in this > >>>>>>>> >>>>>>>> < > >>>>>>>> >>>>>>>> > >>>>>>>> >> > >>>>>>>> > >>>>> > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > >>>>>>>> >>>>>>>> wiki. > >>>>>>>> >>>>>>>> > >>>>>>>> >>>>>>>> > >>>>>>>> >>>>>>>>> Best, > >>>>>>>> >>>>>>>>> Jark > >>>>>>>> >>>>>>>>> > >>>>>>>> >>>>>>>>> > >>>>>>>> >>>>>>>>> On Mon, 26 Dec 2022 at 18:06, Hang Ruan > >>>>>>>> <[email protected]> > >>>>>>>> >>>>>> wrote: > >>>>>>>> >>>>>>>>>> Hi, thanks for the feedback, Zhu Zhu and > Qingsheng. > >>>>>>>> >>>>>>>>>> After combining everyone's comments, the main > >>>>> concerns and > >>>>>>>> >>>>>>>> corresponding > >>>>>>>> >>>>>>>>>> adjustments are as follows. > >>>>>>>> >>>>>>>>>> > >>>>>>>> >>>>>>>>>> Q1: Common metrics are not quite useful. > >>>>>>>> >>>>>>>>>> numEventsIn and numEventsOut counters will be > >>>>> removed > >>>>>>>> from the > >>>>>>>> >>>>>>>>>> OperatorCoordinatorMetricGroup. These common > >>>>> metrics do > >>>>>>> not > >>>>>>>> >>>>>>>>>> provide > >>>>>>>> >>>>>>>>> enough > >>>>>>>> >>>>>>>>>> information for users. The users are more > willing to > >>>>>>>> get the > >>>>>>>> >>>>>>>>>> number > >>>>>>>> >>>>> of > >>>>>>>> >>>>>>>>>> events of the specified type instead of the total > >>>>>>>> number. And this > >>>>>>>> >>>>>>>> metric > >>>>>>>> >>>>>>>>>> is calculated differently. The implementation > could > >>>>>>>> register the > >>>>>>>> >>>>>> metric > >>>>>>>> >>>>>>>>> by > >>>>>>>> >>>>>>>>>> themselves. > >>>>>>>> >>>>>>>>>> > >>>>>>>> >>>>>>>>>> Q2: This FLIP is overly complicated. > >>>>>>>> >>>>>>>>>> This FLIP will become concise after these > >>>>> modifications. > >>>>>>>> >>>>>>>>>> OperatorCoordinatorMetricGroup has already been > >>>>>>>> introduced into > >>>>>>>> >>>>> Flink > >>>>>>>> >>>>>>>> by > >>>>>>>> >>>>>>>>>> FLIP-179< > >>>>>>>> >>>>>>>>>> > >>>>>>>> >>>>>>>>>> > >>>>>>>> >> > >>>>>>>> > >>>>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics > >>>>>>>> >>>>>>>>>>> . > >>>>>>>> >>>>>>>>>> And > >>>>>>>> >>>>>>>>>> this FLIP will not change it. This FLIP only > >>>>> provides a > >>>>>>>> new metric > >>>>>>>> >>>>>>>> option > >>>>>>>> >>>>>>>>>> and a new metric group scope. The changes in > >>>>> proposed > >>>>>>>> changes > >>>>>>>> >>>>> provide > >>>>>>>> >>>>>>>> the > >>>>>>>> >>>>>>>>>> details about the modifications for the internal > >>>>>>>> classes, which > >>>>>>>> >>>>> might > >>>>>>>> >>>>>>>>> make > >>>>>>>> >>>>>>>>>> it look complicated. > >>>>>>>> >>>>>>>>>> > >>>>>>>> >>>>>>>>>> Thanks for all the comments again. If there are > no > >>>>> further > >>>>>>>> >>>>>>>>>> comments, > >>>>>>>> >>>>>> we > >>>>>>>> >>>>>>>>>> plan to start the voting thread this week. > >>>>>>>> >>>>>>>>>> > >>>>>>>> >>>>>>>>>> Best, > >>>>>>>> >>>>>>>>>> Hang > >>>>>>>> >>>>>>>>>> > >>>>>>>> >>>>>>>>>> Qingsheng Ren <[email protected]> > 于2022年12月26日周一 > >>>>>>>> 16:48写道: > >>>>>>>> >>>>>>>>>> > >>>>>>>> >>>>>>>>>> > >>>>>>>> >>>>>>>>>>> Thanks for the FLIP, Hang! > >>>>>>>> >>>>>>>>>>> > >>>>>>>> >>>>>>>>>>> This FLIP overall looks good to me. Actually I > >>>>> share > >>>>>>>> the same > >>>>>>>> >>>>> concern > >>>>>>>> >>>>>>>>>> with > >>>>>>>> >>>>>>>>>>> Zhu that numEventsIn and numEventsOut counters > are > >>>>> not > >>>>>>>> quite > >>>>>>>> >>>>>>>>>>> useful > >>>>>>>> >>>>>>>> to > >>>>>>>> >>>>>>>>>> end > >>>>>>>> >>>>>>>>>>> users. OperatorEvent is a quite low-level > >>>>> abstraction, > >>>>>>>> which > >>>>>>>> >>>>> requires > >>>>>>>> >>>>>>>>>>> instantialization in order to be practical to > >>>>> users and > >>>>>>>> >>>>>>>>>>> developers, > >>>>>>>> >>>>>>>> so > >>>>>>>> >>>>>>>>>>> maybe it's better to exclude them from the FLIP. > >>>>>>>> >>>>>>>>>>> > >>>>>>>> >>>>>>>>>>> Best, > >>>>>>>> >>>>>>>>>>> Qingsheng > >>>>>>>> >>>>>>>>>>> Ververica (Alibaba) > >>>>>>>> >>>>>>>>>>> > >>>>>>>> >>>>>>>>>>> On Mon, Dec 26, 2022 at 12:08 PM Zhu Zhu > >>>>>>>> <[email protected]> > >>>>>>>> >>>>> wrote: > >>>>>>>> >>>>>>>>>>>> Hi Hang, > >>>>>>>> >>>>>>>>>>>> I still see no strong reason why we need > >>>>>>>> >>>>>>>>>>>> numEventsIn/numEventsOut > >>>>>>>> >>>>>>>>>>> metrics. > >>>>>>>> >>>>>>>>>>>> In the discussion in FLINK-29801, I can see the > >>>>> same > >>>>>>>> concern > >>>>>>>> >>>>>>>>>>>> from > >>>>>>>> >>>>>>>>>> others. > >>>>>>>> >>>>>>>>>>>> So I prefer to exclude them from this FLIP to > >>>>> avoid > >>>>>>>> >>>>>>>>>>>> over-extending > >>>>>>>> >>>>>>>>> the > >>>>>>>> >>>>>>>>>>>> scope. > >>>>>>>> >>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>> Thanks, > >>>>>>>> >>>>>>>>>>>> Zhu > >>>>>>>> >>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>> Hang Ruan <[email protected]> > 于2022年12月23日周五 > >>>>>>>> >>>>>>>>>>>> 15:21写道: > >>>>>>>> >>>>>>>>>>>>> Hi everyone, > >>>>>>>> >>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>> As for the Zhu Zhu's problem, I think we > should > >>>>> keep > >>>>>>>> the common > >>>>>>>> >>>>>>>>>>> metrics, > >>>>>>>> >>>>>>>>>>>> which will help to observe incoming and > outgoing > >>>>>>>> events. What do > >>>>>>>> >>>>>>>> you > >>>>>>>> >>>>>>>>>>> think, > >>>>>>>> >>>>>>>>>>>> @Zhu Zhu ? > >>>>>>>> >>>>>>>>>>>>> And @Chesnay, are there any other issues you > are > >>>>>>>> more concerned > >>>>>>>> >>>>>>>>>> about? > >>>>>>>> >>>>>>>>>>>> Looking forward to your reply. > >>>>>>>> >>>>>>>>>>>>> Thanks for all the comments. If there are no > >>>>> further > >>>>>>>> >>>>>>>>>>>>> comments, we > >>>>>>>> >>>>>>>>>> plan > >>>>>>>> >>>>>>>>>>>> to start the voting thread next week. > >>>>>>>> >>>>>>>>>>>>> Best, > >>>>>>>> >>>>>>>>>>>>> Hang > >>>>>>>> >>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>> Hang Ruan <[email protected]> > >>>>> 于2022年12月15日周四 > >>>>>>>> >>>>>>>>>>>>> 16:49写道: > >>>>>>>> >>>>>>>>>>>>>> Hi, Zhu Zhu, > >>>>>>>> >>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>> Thanks for your feedback! > >>>>>>>> >>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>> The OperatorCoordinator implementations are > >>>>>>>> different. And > >>>>>>>> >>>>>>>>>>>>>> their > >>>>>>>> >>>>>>>>>>>> metrics are much different too. We try to find > the > >>>>>>> common > >>>>>>>> >>>>>>>>>>>> metrics > >>>>>>>> >>>>>>>> and > >>>>>>>> >>>>>>>>>> put > >>>>>>>> >>>>>>>>>>>> them in the OperatorCoordinatorMetricGroup. If > >>>>> most > >>>>>>>> developers > >>>>>>>> >>>>>>>> think > >>>>>>>> >>>>>>>>> we > >>>>>>>> >>>>>>>>>>> do > >>>>>>>> >>>>>>>>>>>> not need these common metrics, removing the > common > >>>>>>>> metrics is > >>>>>>>> >>>>>>>>>>>> also > >>>>>>>> >>>>>>>>>>>> acceptable. > >>>>>>>> >>>>>>>>>>>>>> Best, > >>>>>>>> >>>>>>>>>>>>>> Hang > >>>>>>>> >>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>> Zhu Zhu <[email protected]> 于2022年12月14日周三 > >>>>>>>> 22:09写道: > >>>>>>>> >>>>>>>>>>>>>>> Hi Hang & MengYue, > >>>>>>>> >>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>> Thanks for creating this FLIP! > >>>>>>>> >>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>> I think it is very useful, mainly in two > >>>>> aspects: > >>>>>>>> >>>>>>>>>>>>>>> 1. Enables OperatorCoordinators to register > >>>>> metrics. > >>>>>>>> >>>>>>>>>>>>>>> Currently > >>>>>>>> >>>>>>>>>>>>>>> the coordinators has no way to do this. And > >>>>> operator > >>>>>>>> >>>>>>>> coordinator > >>>>>>>> >>>>>>>>>>>>>>> metric group further enables the > >>>>> SplitEnumerator > >>>>>>>> to have > >>>>>>>> >>>>>>>>>>>>>>> access > >>>>>>>> >>>>>>>>>>>>>>> to a registered metric group (via the > existing > >>>>> public > >>>>>>>> >>>>>>>>>>>>>>> interface > >>>>>>>> >>>>>>>>>>>>>>> SplitEnumeratorContext#metricGroup()), > which is > >>>>>>>> null at the > >>>>>>>> >>>>>>>>> moment. > >>>>>>>> >>>>>>>>>>>>>>> 2. Defines the scope of operator coordinator > >>>>>>>> metrics. A clear > >>>>>>>> >>>>>>>>>>>> definition > >>>>>>>> >>>>>>>>>>>>>>> makes it easy for users to find their wanted > >>>>>>>> metrics. The > >>>>>>>> >>>>>>>>>> definition > >>>>>>>> >>>>>>>>>>>>>>> also helps to avoid conflicts of metrics > from > >>>>>>> multiple > >>>>>>>> >>>>>>>>>>>> OperatorCoordinators > >>>>>>>> >>>>>>>>>>>>>>> of the same kind. E.g. each > SourceCoordinator > >>>>> may > >>>>>>>> have its > >>>>>>>> >>>>>>>>>>>>>>> own > >>>>>>>> >>>>>>>>>>>>>>> numSourceSplits metric, these metrics should > >>>>> not > >>>>>>>> be directly > >>>>>>>> >>>>>>>>>>> registered > >>>>>>>> >>>>>>>>>>>>>>> to the job metric group. > >>>>>>>> >>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>> What I'm a bit concerned is the necessity of > >>>>> the > >>>>>>>> introduced > >>>>>>>> >>>>>>>>> common > >>>>>>>> >>>>>>>>>>>> metrics > >>>>>>>> >>>>>>>>>>>>>>> numEventsInCounter & numEventsOutCounter. If > >>>>> there > >>>>>>>> any case > >>>>>>>> >>>>>>>> which > >>>>>>>> >>>>>>>>>>>> strongly > >>>>>>>> >>>>>>>>>>>>>>> requires them? > >>>>>>>> >>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>> Regarding the concerns of Chesnay, > >>>>>>>> >>>>>>>>>>>>>>>> A dedicated coordinator MG implementation > is > >>>>>>> overkill > >>>>>>>> >>>>>>>>>>>>>>> Directly using the job metric group can > result > >>>>> in > >>>>>>>> metric > >>>>>>>> >>>>>>>>> conflicts, > >>>>>>>> >>>>>>>>>>> as > >>>>>>>> >>>>>>>>>>>> mentioned > >>>>>>>> >>>>>>>>>>>>>>> in above #2. > >>>>>>>> >>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>> Thanks, > >>>>>>>> >>>>>>>>>>>>>>> Zhu > >>>>>>>> >>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>> Dong Lin <[email protected]> > 于2022年12月10日周六 > >>>>>>>> >>>>>>>>>>>>>>> 14:16写道: > >>>>>>>> >>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>>> Hi Chesney, > >>>>>>>> >>>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>>> Just to double check with you, > >>>>>>>> >>>>>>>>>>>>>>>> OperatorCoordinatorMetricGroup > >>>>>>>> >>>>>>>>>>>> (annotated as > >>>>>>>> >>>>>>>>>>>>>>>> @PublicEvolving) has already been > introduced > >>>>> into > >>>>>>>> Flink by > >>>>>>>> >>>>>>>>>> FLIP-179 > >>>>>>>> >>>>>>>>>>>>>>>> < > >>>>>>>> >> > >>>>>>>> > >>>>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics > >>>>>>>> >>>>>>>>>>>>> . > >>>>>>>> >>>>>>>>>>>>>>>> And that FLIP has got you +1.. Do you mean > we > >>>>>>>> should remove > >>>>>>>> >>>>>>>>> this > >>>>>>>> >>>>>>>>>>>>>>>> OperatorCoordinatorMetricGroup? > >>>>>>>> >>>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>>> Regards, > >>>>>>>> >>>>>>>>>>>>>>>> Dong > >>>>>>>> >>>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>>> On Sat, Dec 10, 2022 at 1:33 AM Chesnay > >>>>> Schepler < > >>>>>>>> >>>>>>>>>>> [email protected]> > >>>>>>>> >>>>>>>>>>>> wrote: > >>>>>>>> >>>>>>>>>>>>>>>>> As a whole I feel like this FLIP is overly > >>>>>>>> complicated. A > >>>>>>>> >>>>>>>>>>> dedicated > >>>>>>>> >>>>>>>>>>>>>>>>> coordinator MG implementation is > overkill; it > >>>>>>>> could just > >>>>>>>> >>>>>>>>> re-use > >>>>>>>> >>>>>>>>>>> the > >>>>>>>> >>>>>>>>>>>>>>>>> existing Task/OperatorMGs to create the > same > >>>>>>>> structure we > >>>>>>>> >>>>>>>>> have > >>>>>>>> >>>>>>>>>> on > >>>>>>>> >>>>>>>>>>>> TMs, > >>>>>>>> >>>>>>>>>>>>>>>>> similar to what we did with the Job MG. > >>>>>>>> >>>>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>>>> However, I'm not convinced that this is > >>>>> required > >>>>>>>> anyway, > >>>>>>>> >>>>>>>>>> because > >>>>>>>> >>>>>>>>>>>> all the > >>>>>>>> >>>>>>>>>>>>>>>>> example metrics you listed can be > >>>>> implemented on > >>>>>>>> the TM > >>>>>>>> >>>>>>>> side > >>>>>>>> >>>>>>>>> + > >>>>>>>> >>>>>>>>>>>>>>>>> aggregating them in the external metrics > >>>>> backend. > >>>>>>>> >>>>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>>>> Since I'm on holidays soon, just so no one > >>>>> tries > >>>>>>>> to pull a > >>>>>>>> >>>>>>>>> fast > >>>>>>>> >>>>>>>>>>>> one on > >>>>>>>> >>>>>>>>>>>>>>>>> me, if this were to go to a vote as-is > I'd be > >>>>>>>> against it. > >>>>>>>> >>>>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>>>> On 09/12/2022 15:30, Dong Lin wrote: > >>>>>>>> >>>>>>>>>>>>>>>>>> Hi Hang, > >>>>>>>> >>>>>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>>>>> Thanks for the FLIP! The FLIP looks good > >>>>> and it > >>>>>>>> is pretty > >>>>>>>> >>>>>>>>>>>> informative. > >>>>>>>> >>>>>>>>>>>>>>>>>> I have just two minor comments regarding > >>>>> names: > >>>>>>>> >>>>>>>>>>>>>>>>>> - Would it be useful to rename the config > >>>>> key as > >>>>>>>> >>>>>>>>>>>>>>>>>> > *metrics.scope.jm.job.operator-coordinator* > >>>>> for > >>>>>>>> >>>>>>>> consistency > >>>>>>>> >>>>>>>>>>> with > >>>>>>>> >>>>>>>>>>>>>>>>>> *metrics.scope.jm.job > >>>>>>>> >>>>>>>>>>>>>>>>>> *(which is not named as *jm-job)? > >>>>>>>> >>>>>>>>>>>>>>>>>> - Maybe rename the variable as > >>>>>>>> >>>>>>>>>>> SCOPE_NAMING_OPERATOR_COORDINATOR > >>>>>>>> >>>>>>>>>>>> for > >>>>>>>> >>>>>>>>>>>>>>>>>> simplicity and consistency with > >>>>>>>> SCOPE_NAMING_OPERATOR > >>>>>>>> >>>>>>>>> (which > >>>>>>>> >>>>>>>>>> is > >>>>>>>> >>>>>>>>>>>> not named > >>>>>>>> >>>>>>>>>>>>>>>>>> as SCOPE_NAMING_TM_JOB_OPERATOR)? > >>>>>>>> >>>>>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>>>>> Cheers, > >>>>>>>> >>>>>>>>>>>>>>>>>> Dong > >>>>>>>> >>>>>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Dec 8, 2022 at 3:28 PM Hang Ruan > < > >>>>>>>> >>>>>>>>>>> [email protected]> > >>>>>>>> >>>>>>>>>>>> wrote: > >>>>>>>> >>>>>>>>>>>>>>>>>>> Hi all, > >>>>>>>> >>>>>>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>>>>>> MengYue and I created FLIP-274[1] > Introduce > >>>>>>>> metric group > >>>>>>>> >>>>>>>>> for > >>>>>>>> >>>>>>>>>>>>>>>>>>> OperatorCoordinator. OperatorCoordinator > >>>>> is the > >>>>>>>> >>>>>>>>> coordinator > >>>>>>>> >>>>>>>>>>> for > >>>>>>>> >>>>>>>>>>>> runtime > >>>>>>>> >>>>>>>>>>>>>>>>>>> operators and running on Job Manager. > The > >>>>>>>> coordination > >>>>>>>> >>>>>>>>>>>> mechanism is > >>>>>>>> >>>>>>>>>>>>>>>>>>> operator events between > OperatorCoordinator > >>>>>>>> and its all > >>>>>>>> >>>>>>>>>>>> operators, the > >>>>>>>> >>>>>>>>>>>>>>>>>>> coordination is more and more using in > >>>>> Flink, for > >>>>>>>> >>>>>>>> example > >>>>>>>> >>>>>>>>>> many > >>>>>>>> >>>>>>>>>>>> Sources > >>>>>>>> >>>>>>>>>>>>>>>>> and > >>>>>>>> >>>>>>>>>>>>>>>>>>> Sinks depend on the mechanism to assign > >>>>> splits > >>>>>>> and > >>>>>>>> >>>>>>>>>> coordinate > >>>>>>>> >>>>>>>>>>>> commits to > >>>>>>>> >>>>>>>>>>>>>>>>>>> external systems. The > OperatorCoordinator > >>>>> is > >>>>>>>> widely > >>>>>>>> >>>>>>>> using > >>>>>>>> >>>>>>>>> in > >>>>>>>> >>>>>>>>>>>> flink kafka > >>>>>>>> >>>>>>>>>>>>>>>>>>> connector, flink pulsar connector, flink > >>>>> cdc > >>>>>>>> connector, > >>>>>>>> >>>>>>>>>> flink > >>>>>>>> >>>>>>>>>>>> hudi > >>>>>>>> >>>>>>>>>>>>>>>>>>> connector and so on. > >>>>>>>> >>>>>>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>>>>>> But there is not a suitable metric group > >>>>> scope > >>>>>>>> for the > >>>>>>>> >>>>>>>>>>>>>>>>> OperatorCoordinator > >>>>>>>> >>>>>>>>>>>>>>>>>>> and not an implementation for the > interface > >>>>>>>> >>>>>>>>>>>>>>>>> OperatorCoordinatorMetricGroup. > >>>>>>>> >>>>>>>>>>>>>>>>>>> These metrics in OperatorCoordinator > could > >>>>> be > >>>>>>>> how many > >>>>>>>> >>>>>>>>>>>> splits/partitions > >>>>>>>> >>>>>>>>>>>>>>>>>>> have been assigned to source readers, > how > >>>>> many > >>>>>>>> files > >>>>>>>> >>>>>>>> have > >>>>>>>> >>>>>>>>>> been > >>>>>>>> >>>>>>>>>>>> written > >>>>>>>> >>>>>>>>>>>>>>>>> out > >>>>>>>> >>>>>>>>>>>>>>>>>>> by sink writers, these metrics not only > >>>>> help > >>>>>>>> users to > >>>>>>>> >>>>>>>> know > >>>>>>>> >>>>>>>>>> the > >>>>>>>> >>>>>>>>>>>> job > >>>>>>>> >>>>>>>>>>>>>>>>> progress > >>>>>>>> >>>>>>>>>>>>>>>>>>> but also make big job maintaining > easier. > >>>>> Thus we > >>>>>>>> >>>>>>>> propose > >>>>>>>> >>>>>>>>>> the > >>>>>>>> >>>>>>>>>>>> FLIP-274 > >>>>>>>> >>>>>>>>>>>>>>>>> to > >>>>>>>> >>>>>>>>>>>>>>>>>>> introduce a new metric group scope for > >>>>>>>> >>>>>>>> OperatorCoordinator > >>>>>>>> >>>>>>>>>> and > >>>>>>>> >>>>>>>>>>>> provide > >>>>>>>> >>>>>>>>>>>>>>>>> an > >>>>>>>> >>>>>>>>>>>>>>>>>>> internal implementation for > >>>>>>>> >>>>>>>>> OperatorCoordinatorMetricGroup. > >>>>>>>> >>>>>>>>>>>>>>>>>>> Could you help review this FLIP when you > >>>>> get > >>>>>>>> time? Any > >>>>>>>> >>>>>>>>>>> feedback > >>>>>>>> >>>>>>>>>>>> is > >>>>>>>> >>>>>>>>>>>>>>>>>>> appreciated! > >>>>>>>> >>>>>>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>>>>>> Best, > >>>>>>>> >>>>>>>>>>>>>>>>>>> Hang > >>>>>>>> >>>>>>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>>>>>> [1] > >>>>>>>> >>>>>>>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>>>>>>> > >>>>>>>> >> > >>>>>>>> > >>>>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-274%3A+Introduce+metric+group+for+OperatorCoordinator > >>>>>>>> >> > >>>>>>>> > >>>>> > >
