[jira] [Created] (FLINK-33096) Flink on k8s，if one taskmanager pod was crashed，the whole flink job will be failed

wawa (Jira) Sun, 17 Sep 2023 06:46:04 -0700

wawa created FLINK-33096:
----------------------------

             Summary: Flink on k8s，if one taskmanager pod was crashed，the whole 
flink job will be failed
                 Key: FLINK-33096
                 URL: https://issues.apache.org/jira/browse/FLINK-33096
             Project: Flink
          Issue Type: Bug
          Components: Deployment / Kubernetes
    Affects Versions: 1.14.3
            Reporter: wawa



The Flink version is 1.14.3, and the job is submitted to Kubernetes using the 
Native Kubernetes application mode. During the scheduling process, when a 
TaskManager pod crashes due to an exception, Kubernetes will attempt to start a 
new TaskManager pod. However, the scheduling process is halted immediately, 
resulting in the entire Flink job being terminated. On the other hand, if the 
JobManager pod crashes, Kubernetes is able to successfully schedule a new 
JobManager pod. This observation was made during application usage. Can you 
please help analyze the underlying issue?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33096) Flink on k8s，if one taskmanager pod was crashed，the whole flink job will be failed

Reply via email to