[jira] [Comment Edited] (FLINK-26719) Rethink the default reschedule reconcile loop

2022-03-19 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509231#comment-17509231
 ] 

Aitozi edited comment on FLINK-26719 at 3/19/22, 9:37 AM:
--

{quote}
If we do not want to provide stronger resiliency/guarantees than the Flink 
native integration in itself then I guess we do not need to check, or it's 
enough to check at larger intervals.
{quote}
I have understood generally. In other words, we are using the reconcile loop to 
do the periodic check and plan to produce the ERROR events, Right? 

I think it's an interesting feature to explore, it may be an ability of 
monitoring or self-healing of the operator. The monitoring can use the polling 
or the informer based technique.

Thanks for your guys' explanation, Let’s go and see the evolution of this 
ability :).


was (Author: aitozi):
{quote}If we do not want to provide stronger resiliency/guarantees than the 
Flink native integration in itself then I guess we do not need to check, or 
it's enough to check at larger intervals.
{quote}
I have understood generally. In other words, we are using the reconcile loop to 
do the periodic check and plan to produce the ERROR events, Right? 

I think it's an interesting feature to explore, it may be an ability of 
monitoring or self-healing of the operator. The monitoring can use the polling 
or the informer based technique.

Thanks for your guys' explanation, Let’s go and see the evolution of this 
ability :).

> Rethink the default reschedule reconcile loop
> -
>
> Key: FLINK-26719
> URL: https://issues.apache.org/jira/browse/FLINK-26719
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> When I test locally, I found that it will reschedule and reconcile with the 
> {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I 
> think we just need to reconcile
>  # waiting for the status change
>  # receive the new event
>  # waiting for the savepoint result
> So when JobManagerDeploymentStatus is Ready, we do not have to trigger the 
> reconcile except waiting for the savepoint result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26719) Rethink the default reschedule reconcile loop

2022-03-19 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509231#comment-17509231
 ] 

Aitozi edited comment on FLINK-26719 at 3/19/22, 9:36 AM:
--

{quote}If we do not want to provide stronger resiliency/guarantees than the 
Flink native integration in itself then I guess we do not need to check, or 
it's enough to check at larger intervals.
{quote}
I have understood generally. In other words, we are using the reconcile loop to 
do the periodic check and plan to produce the ERROR events, Right? 

I think it's an interesting feature to explore, it may be an ability of 
monitoring or self-healing of the operator. The monitoring can use the polling 
or the informer based technique.

Thanks for your guys' explanation, Let’s go and see the evolution of this 
ability :).


was (Author: aitozi):
{quote}
If we do not want to provide stronger resiliency/guarantees than the Flink 
native integration in itself then I guess we do not need to check, or it's 
enough to check at larger intervals.
{quote}
I have understood generally. In other words, we are using the reconcile loop to 
do the periodic check and plan to produce the ERROR events, Right? 

I think it's an interesting feature to explore, it may be an ability of 
monitoring or self-healing of the operator. The monitoring can use the polling 
or the informer based technique.

Thanks for your guys' explanation, Let’s go and see the evolution of this 
ability :).

> Rethink the default reschedule reconcile loop
> -
>
> Key: FLINK-26719
> URL: https://issues.apache.org/jira/browse/FLINK-26719
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> When I test locally, I found that it will reschedule and reconcile with the 
> {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I 
> think we just need to reconcile
>  # waiting for the status change
>  # receive the new event
>  # waiting for the savepoint result
> So when JobManagerDeploymentStatus is Ready, we do not have to trigger the 
> reconcile except waiting for the savepoint result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26719) Rethink the default reschedule reconcile loop

2022-03-19 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509231#comment-17509231
 ] 

Aitozi edited comment on FLINK-26719 at 3/19/22, 9:35 AM:
--

{quote}If we do not want to provide stronger resiliency/guarantees than the 
Flink native integration in itself then I guess we do not need to check, or 
it's enough to check at larger intervals.
{quote}
I have understood generally. In other words, we are using the reconcile loop to 
do the periodic check and plan to produce the ERROR events, Right? 

I think it's an interesting feature to explore, it may be an ability of 
monitoring or self-healing of the operator. The monitoring can use the polling 
or the informer based technique.

Thanks for your guys' explanation, Let’s go and see the evolution of this 
ability :).


was (Author: aitozi):
> If we do not want to provide stronger resiliency/guarantees than the Flink 
> native integration in itself then I guess we do not need to check, or it's 
> enough to check at larger intervals.

I have understood generally. In other words, we are using the reconcile loop to 
do the periodic check and plan to produce the ERROR events, Right? 

I think it's an interesting feature to explore, it may be an ability of 
monitoring or self-healing of the operator. The monitoring can use the polling 
or the informer based technique.

Thanks for your guys' explanation, Let’s go and see the evolution of this 
ability :).

> Rethink the default reschedule reconcile loop
> -
>
> Key: FLINK-26719
> URL: https://issues.apache.org/jira/browse/FLINK-26719
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> When I test locally, I found that it will reschedule and reconcile with the 
> {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I 
> think we just need to reconcile
>  # waiting for the status change
>  # receive the new event
>  # waiting for the savepoint result
> So when JobManagerDeploymentStatus is Ready, we do not have to trigger the 
> reconcile except waiting for the savepoint result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26719) Rethink the default reschedule reconcile loop

2022-03-19 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509231#comment-17509231
 ] 

Aitozi edited comment on FLINK-26719 at 3/19/22, 9:35 AM:
--

{quote}
If we do not want to provide stronger resiliency/guarantees than the Flink 
native integration in itself then I guess we do not need to check, or it's 
enough to check at larger intervals.
{quote}
I have understood generally. In other words, we are using the reconcile loop to 
do the periodic check and plan to produce the ERROR events, Right? 

I think it's an interesting feature to explore, it may be an ability of 
monitoring or self-healing of the operator. The monitoring can use the polling 
or the informer based technique.

Thanks for your guys' explanation, Let’s go and see the evolution of this 
ability :).


was (Author: aitozi):
{quote}If we do not want to provide stronger resiliency/guarantees than the 
Flink native integration in itself then I guess we do not need to check, or 
it's enough to check at larger intervals.
{quote}
I have understood generally. In other words, we are using the reconcile loop to 
do the periodic check and plan to produce the ERROR events, Right? 

I think it's an interesting feature to explore, it may be an ability of 
monitoring or self-healing of the operator. The monitoring can use the polling 
or the informer based technique.

Thanks for your guys' explanation, Let’s go and see the evolution of this 
ability :).

> Rethink the default reschedule reconcile loop
> -
>
> Key: FLINK-26719
> URL: https://issues.apache.org/jira/browse/FLINK-26719
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> When I test locally, I found that it will reschedule and reconcile with the 
> {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I 
> think we just need to reconcile
>  # waiting for the status change
>  # receive the new event
>  # waiting for the savepoint result
> So when JobManagerDeploymentStatus is Ready, we do not have to trigger the 
> reconcile except waiting for the savepoint result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26719) Rethink the default reschedule reconcile loop

2022-03-18 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17508658#comment-17508658
 ] 

Aitozi edited comment on FLINK-26719 at 3/18/22, 8:55 AM:
--

[~matyas] Thanks for your inputs, It seems we have done the same thing manually 
in 
{{org.apache.flink.kubernetes.operator.observer.JobManagerDeploymentStatus#rescheduleAfter}}
 .

[~wangyang0918], IMO we have to define a final/target status for example: {{the 
JobManager is ready for serve}} and stop the reconcile, It's not a common way 
to run a periodic loop to sync status without an end. 


was (Author: aitozi):
[~matyas] Thanks for your inputs, It seems we have done the same thing manually 
in 
\{{org.apache.flink.kubernetes.operator.observer.JobManagerDeploymentStatus#rescheduleAfter}}
 .

[~wangyang0918], IMO we have to define a final/target status for example: {{the 
JobManager is ready for serve}} and stop the reconcile, It's not a common way 
to run each loop to sync status without an end. 

> Rethink the default reschedule reconcile loop
> -
>
> Key: FLINK-26719
> URL: https://issues.apache.org/jira/browse/FLINK-26719
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> When I test locally, I found that it will reschedule and reconcile with the 
> {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I 
> think we just need to reconcile
>  # waiting for the status change
>  # receive the new event
>  # waiting for the savepoint result
> So when JobManagerDeploymentStatus is Ready, we do not have to trigger the 
> reconcile except waiting for the savepoint result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)