wawa created FLINK-33096: ---------------------------- Summary: Flink on k8s,if one taskmanager pod was crashed,the whole flink job will be failed Key: FLINK-33096 URL: https://issues.apache.org/jira/browse/FLINK-33096 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Affects Versions: 1.14.3 Reporter: wawa
The Flink version is 1.14.3, and the job is submitted to Kubernetes using the Native Kubernetes application mode. During the scheduling process, when a TaskManager pod crashes due to an exception, Kubernetes will attempt to start a new TaskManager pod. However, the scheduling process is halted immediately, resulting in the entire Flink job being terminated. On the other hand, if the JobManager pod crashes, Kubernetes is able to successfully schedule a new JobManager pod. This observation was made during application usage. Can you please help analyze the underlying issue? -- This message was sent by Atlassian Jira (v8.20.10#820010)