[ https://issues.apache.org/jira/browse/FLINK-30036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635088#comment-17635088 ]
Peng Yuan commented on FLINK-30036: ----------------------------------- Just as [~wangyang0918] said, there are a number of conditions that can cause a pod to remain in a terminating state, so we need to add a config option to determinate the pod should be force-delete or not. For the concern that node not ready does not always mean the pod will block at terminating status. I lean to think that node not ready will certainly cause the pod block in a terminating status. The node object of k8s api describe the *Ready* of *NodeCondition* as {^}[1]{^}: {quote}{{True}} if the node is healthy and ready to accept pods, {{False}} if the node is not healthy and is not accepting pods, and {{Unknown}} if the node controller has not heard from the node in the last {{node-monitor-grace-period}} (default is 40 seconds)’ {quote} When the node is unreachable, pod will be as {^}[2]{^}: {quote}A Pod is not deleted automatically when a node is unreachable. The Pods running on an unreachable Node enter the 'Terminating' or 'Unknown' state after a [timeout|https://kubernetes.io/docs/concepts/architecture/nodes/#condition]. Pods may also enter these states when the user attempts graceful deletion of a Pod on an unreachable Node. The only ways in which a Pod in such a state can be removed from the apiserver are as follows: * The Node object is deleted (either by you, or by the [Node Controller|https://kubernetes.io/docs/concepts/architecture/nodes/#node-controller]). * The kubelet on the unresponsive Node starts responding, kills the Pod and removes the entry from the apiserver. * Force deletion of the Pod by the user. The recommended best practice is to use the first or second approach. If a Node is confirmed to be dead (e.g. permanently disconnected from the network, powered down, etc), then delete the Node object. If the Node is suffering from a network partition, then try to resolve this or wait for it to resolve. When the partition heals, the kubelet will complete the deletion of the Pod and free up its name in the apiserver. {quote} For flink itself, in the k8s environment, when the taskmanager connection times out, resourcemanager will try to delete the tm pod. If the cause of the timeout is detected, Resourcemanager can forcibly delete the pod to quickly recover the task, because real-time performance is very important to flink. {*}[1]{*}.Node conditions:[Nodes | Kubernetes|https://kubernetes.io/docs/concepts/architecture/nodes/#condition] {*}[2].{*}Delete pods:[Force Delete StatefulSet Pods | Kubernetes|https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/#delete-pods] > Force delete pod when k8s node is not ready > -------------------------------------------- > > Key: FLINK-30036 > URL: https://issues.apache.org/jira/browse/FLINK-30036 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes > Reporter: Peng Yuan > Priority: Major > Labels: pull-request-available > Attachments: image-2022-11-17-10-25-59-945.png > > > When the K8s node is in the NotReady state, the taskmanager pod scheduled on > it is always in the terminating state. When the flink cluster has a strict > quota, the terminating pod will hold the resources all the time. As a result, > the new taskmanager pod cannot apply for resources and cannot be started. > -- This message was sent by Atlassian Jira (v8.20.10#820010)