[jira] [Comment Edited] (FLINK-26719) Rethink the default reschedule reconcile loop
[ https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509231#comment-17509231 ] Aitozi edited comment on FLINK-26719 at 3/19/22, 9:37 AM: -- {quote} If we do not want to provide stronger resiliency/guarantees than the Flink native integration in itself then I guess we do not need to check, or it's enough to check at larger intervals. {quote} I have understood generally. In other words, we are using the reconcile loop to do the periodic check and plan to produce the ERROR events, Right? I think it's an interesting feature to explore, it may be an ability of monitoring or self-healing of the operator. The monitoring can use the polling or the informer based technique. Thanks for your guys' explanation, Let’s go and see the evolution of this ability :). was (Author: aitozi): {quote}If we do not want to provide stronger resiliency/guarantees than the Flink native integration in itself then I guess we do not need to check, or it's enough to check at larger intervals. {quote} I have understood generally. In other words, we are using the reconcile loop to do the periodic check and plan to produce the ERROR events, Right? I think it's an interesting feature to explore, it may be an ability of monitoring or self-healing of the operator. The monitoring can use the polling or the informer based technique. Thanks for your guys' explanation, Let’s go and see the evolution of this ability :). > Rethink the default reschedule reconcile loop > - > > Key: FLINK-26719 > URL: https://issues.apache.org/jira/browse/FLINK-26719 > Project: Flink > Issue Type: Sub-task >Reporter: Aitozi >Priority: Major > > When I test locally, I found that it will reschedule and reconcile with the > {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I > think we just need to reconcile > # waiting for the status change > # receive the new event > # waiting for the savepoint result > So when JobManagerDeploymentStatus is Ready, we do not have to trigger the > reconcile except waiting for the savepoint result. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (FLINK-26719) Rethink the default reschedule reconcile loop
[ https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509231#comment-17509231 ] Aitozi edited comment on FLINK-26719 at 3/19/22, 9:36 AM: -- {quote}If we do not want to provide stronger resiliency/guarantees than the Flink native integration in itself then I guess we do not need to check, or it's enough to check at larger intervals. {quote} I have understood generally. In other words, we are using the reconcile loop to do the periodic check and plan to produce the ERROR events, Right? I think it's an interesting feature to explore, it may be an ability of monitoring or self-healing of the operator. The monitoring can use the polling or the informer based technique. Thanks for your guys' explanation, Let’s go and see the evolution of this ability :). was (Author: aitozi): {quote} If we do not want to provide stronger resiliency/guarantees than the Flink native integration in itself then I guess we do not need to check, or it's enough to check at larger intervals. {quote} I have understood generally. In other words, we are using the reconcile loop to do the periodic check and plan to produce the ERROR events, Right? I think it's an interesting feature to explore, it may be an ability of monitoring or self-healing of the operator. The monitoring can use the polling or the informer based technique. Thanks for your guys' explanation, Let’s go and see the evolution of this ability :). > Rethink the default reschedule reconcile loop > - > > Key: FLINK-26719 > URL: https://issues.apache.org/jira/browse/FLINK-26719 > Project: Flink > Issue Type: Sub-task >Reporter: Aitozi >Priority: Major > > When I test locally, I found that it will reschedule and reconcile with the > {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I > think we just need to reconcile > # waiting for the status change > # receive the new event > # waiting for the savepoint result > So when JobManagerDeploymentStatus is Ready, we do not have to trigger the > reconcile except waiting for the savepoint result. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (FLINK-26719) Rethink the default reschedule reconcile loop
[ https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509231#comment-17509231 ] Aitozi edited comment on FLINK-26719 at 3/19/22, 9:35 AM: -- {quote}If we do not want to provide stronger resiliency/guarantees than the Flink native integration in itself then I guess we do not need to check, or it's enough to check at larger intervals. {quote} I have understood generally. In other words, we are using the reconcile loop to do the periodic check and plan to produce the ERROR events, Right? I think it's an interesting feature to explore, it may be an ability of monitoring or self-healing of the operator. The monitoring can use the polling or the informer based technique. Thanks for your guys' explanation, Let’s go and see the evolution of this ability :). was (Author: aitozi): > If we do not want to provide stronger resiliency/guarantees than the Flink > native integration in itself then I guess we do not need to check, or it's > enough to check at larger intervals. I have understood generally. In other words, we are using the reconcile loop to do the periodic check and plan to produce the ERROR events, Right? I think it's an interesting feature to explore, it may be an ability of monitoring or self-healing of the operator. The monitoring can use the polling or the informer based technique. Thanks for your guys' explanation, Let’s go and see the evolution of this ability :). > Rethink the default reschedule reconcile loop > - > > Key: FLINK-26719 > URL: https://issues.apache.org/jira/browse/FLINK-26719 > Project: Flink > Issue Type: Sub-task >Reporter: Aitozi >Priority: Major > > When I test locally, I found that it will reschedule and reconcile with the > {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I > think we just need to reconcile > # waiting for the status change > # receive the new event > # waiting for the savepoint result > So when JobManagerDeploymentStatus is Ready, we do not have to trigger the > reconcile except waiting for the savepoint result. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (FLINK-26719) Rethink the default reschedule reconcile loop
[ https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509231#comment-17509231 ] Aitozi edited comment on FLINK-26719 at 3/19/22, 9:35 AM: -- {quote} If we do not want to provide stronger resiliency/guarantees than the Flink native integration in itself then I guess we do not need to check, or it's enough to check at larger intervals. {quote} I have understood generally. In other words, we are using the reconcile loop to do the periodic check and plan to produce the ERROR events, Right? I think it's an interesting feature to explore, it may be an ability of monitoring or self-healing of the operator. The monitoring can use the polling or the informer based technique. Thanks for your guys' explanation, Let’s go and see the evolution of this ability :). was (Author: aitozi): {quote}If we do not want to provide stronger resiliency/guarantees than the Flink native integration in itself then I guess we do not need to check, or it's enough to check at larger intervals. {quote} I have understood generally. In other words, we are using the reconcile loop to do the periodic check and plan to produce the ERROR events, Right? I think it's an interesting feature to explore, it may be an ability of monitoring or self-healing of the operator. The monitoring can use the polling or the informer based technique. Thanks for your guys' explanation, Let’s go and see the evolution of this ability :). > Rethink the default reschedule reconcile loop > - > > Key: FLINK-26719 > URL: https://issues.apache.org/jira/browse/FLINK-26719 > Project: Flink > Issue Type: Sub-task >Reporter: Aitozi >Priority: Major > > When I test locally, I found that it will reschedule and reconcile with the > {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I > think we just need to reconcile > # waiting for the status change > # receive the new event > # waiting for the savepoint result > So when JobManagerDeploymentStatus is Ready, we do not have to trigger the > reconcile except waiting for the savepoint result. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (FLINK-26719) Rethink the default reschedule reconcile loop
[ https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17508658#comment-17508658 ] Aitozi edited comment on FLINK-26719 at 3/18/22, 8:55 AM: -- [~matyas] Thanks for your inputs, It seems we have done the same thing manually in {{org.apache.flink.kubernetes.operator.observer.JobManagerDeploymentStatus#rescheduleAfter}} . [~wangyang0918], IMO we have to define a final/target status for example: {{the JobManager is ready for serve}} and stop the reconcile, It's not a common way to run a periodic loop to sync status without an end. was (Author: aitozi): [~matyas] Thanks for your inputs, It seems we have done the same thing manually in \{{org.apache.flink.kubernetes.operator.observer.JobManagerDeploymentStatus#rescheduleAfter}} . [~wangyang0918], IMO we have to define a final/target status for example: {{the JobManager is ready for serve}} and stop the reconcile, It's not a common way to run each loop to sync status without an end. > Rethink the default reschedule reconcile loop > - > > Key: FLINK-26719 > URL: https://issues.apache.org/jira/browse/FLINK-26719 > Project: Flink > Issue Type: Sub-task >Reporter: Aitozi >Priority: Major > > When I test locally, I found that it will reschedule and reconcile with the > {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I > think we just need to reconcile > # waiting for the status change > # receive the new event > # waiting for the savepoint result > So when JobManagerDeploymentStatus is Ready, we do not have to trigger the > reconcile except waiting for the savepoint result. -- This message was sent by Atlassian Jira (v8.20.1#820001)