[ 
https://issues.apache.org/jira/browse/FLINK-30036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635088#comment-17635088
 ] 

Peng Yuan commented on FLINK-30036:
-----------------------------------

Just as [~wangyang0918] said, there are a number of conditions that can cause a 
pod to remain in a terminating state, so we need to add a config option to 
determinate the pod should be force-delete or not.

For the concern that node not ready does not always mean the pod will block at 
terminating status. I lean to think that node not ready will certainly cause 
the pod block in a terminating status. The node object of k8s api describe the 
*Ready* of *NodeCondition* as {^}[1]{^}:
{quote}{{True}} if the node is healthy and ready to accept pods, {{False}} if 
the node is not healthy and is not accepting pods, and {{Unknown}} if the node 
controller has not heard from the node in the last 
{{node-monitor-grace-period}} (default is 40 seconds)’
{quote}
When the node is unreachable, pod will be as {^}[2]{^}: 
{quote}A Pod is not deleted automatically when a node is unreachable. The Pods 
running on an unreachable Node enter the 'Terminating' or 'Unknown' state after 
a [timeout|https://kubernetes.io/docs/concepts/architecture/nodes/#condition]. 
Pods may also enter these states when the user attempts graceful deletion of a 
Pod on an unreachable Node. The only ways in which a Pod in such a state can be 
removed from the apiserver are as follows:
 * The Node object is deleted (either by you, or by the [Node 
Controller|https://kubernetes.io/docs/concepts/architecture/nodes/#node-controller]).
 * The kubelet on the unresponsive Node starts responding, kills the Pod and 
removes the entry from the apiserver.
 * Force deletion of the Pod by the user.

The recommended best practice is to use the first or second approach. If a Node 
is confirmed to be dead (e.g. permanently disconnected from the network, 
powered down, etc), then delete the Node object. If the Node is suffering from 
a network partition, then try to resolve this or wait for it to resolve. When 
the partition heals, the kubelet will complete the deletion of the Pod and free 
up its name in the apiserver.
{quote}
For flink itself, in the k8s environment, when the taskmanager connection times 
out, resourcemanager will try to delete the tm pod. If the cause of the timeout 
is detected, Resourcemanager can forcibly delete the pod to quickly recover the 
task, because real-time performance is very important to flink.

{*}[1]{*}.Node conditions:[Nodes | 
Kubernetes|https://kubernetes.io/docs/concepts/architecture/nodes/#condition]

{*}[2].{*}Delete pods:[Force Delete StatefulSet Pods | 
Kubernetes|https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/#delete-pods]

> Force delete pod when  k8s node is not ready
> --------------------------------------------
>
>                 Key: FLINK-30036
>                 URL: https://issues.apache.org/jira/browse/FLINK-30036
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes
>            Reporter: Peng Yuan
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2022-11-17-10-25-59-945.png
>
>
> When the K8s node is in the NotReady state, the taskmanager pod scheduled on 
> it is always in the terminating state. When the flink cluster has a strict 
> quota, the terminating pod will hold the resources all the time. As a result, 
> the new taskmanager pod cannot apply for resources and cannot be started.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to