Yang Wang created FLINK-20417:
---------------------------------

             Summary: Handle "Too old resource version" exception in Kubernetes 
watch more gracefully
                 Key: FLINK-20417
                 URL: https://issues.apache.org/jira/browse/FLINK-20417
             Project: Flink
          Issue Type: Improvement
          Components: Deployment / Kubernetes
            Reporter: Yang Wang


Currently, when the watcher(pods watcher, configmap watcher) is closed with 
exception, we will call {{WatchCallbackHandler#handleFatalError}}. And this 
could cause JobManager terminating and then failover.

For most cases, this is correct. But not for "too old resource version" 
exception. See more information here[1]. Usually this exception could happen 
when the APIServer is restarted. And we just need to create a new watch and 
continue to do the pods/configmap watching. This could help the Flink cluster 
reducing the impact of K8s cluster restarting.

 

The issue is inspired by this technical article[2]. Thanks the guys from 
tencent for the debugging. Note this is a Chinese documentation.

 

[1]. 
[https://stackoverflow.com/questions/61409596/kubernetes-too-old-resource-version]

[2]. [https://cloud.tencent.com/developer/article/1731416]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to