[ https://issues.apache.org/jira/browse/FLINK-20417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277767#comment-17277767 ]
Yang Wang commented on FLINK-20417: ----------------------------------- Some users send me with private email to share they also come across with this issue when using {{KubernetesHAService}} for standalone deployment. > Handle "Too old resource version" exception in Kubernetes watch more > gracefully > ------------------------------------------------------------------------------- > > Key: FLINK-20417 > URL: https://issues.apache.org/jira/browse/FLINK-20417 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes > Affects Versions: 1.11.2, 1.12.0 > Reporter: Yang Wang > Assignee: Yang Wang > Priority: Critical > Labels: pull-request-available > Fix For: 1.12.2, 1.13.0 > > > Currently, when the watcher(pods watcher, configmap watcher) is closed with > exception, we will call {{WatchCallbackHandler#handleFatalError}}. And this > could cause JobManager terminating and then failover. > For most cases, this is correct. But not for "too old resource version" > exception. See more information here[1]. Usually this exception could happen > when the APIServer is restarted. And we just need to create a new watch and > continue to do the pods/configmap watching. This could help the Flink cluster > reducing the impact of K8s cluster restarting. > > The issue is inspired by this technical article[2]. Thanks the guys from > tencent for the debugging. Note this is a Chinese documentation. > > [1]. > [https://stackoverflow.com/questions/61409596/kubernetes-too-old-resource-version] > [2]. [https://cloud.tencent.com/developer/article/1731416] -- This message was sent by Atlassian Jira (v8.3.4#803005)