[ 
https://issues.apache.org/jira/browse/FLINK-24624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434081#comment-17434081
 ] 

Aitozi commented on FLINK-24624:
--------------------------------

[~wangyang0918] Besides that, I create this issue also want to discuss that do 
we have to guarantee the k8s resource is cleaned when we deploy a session or 
application mode cluster failed. 

As far as i know(I am doing some test to deploy kubernetes deploy), there is 
residual k8s resources in some situations like:

1. deployClusterInternal success , but failed to getClusterClient from the 
{{ClusterClientProvider}} which is shown in this issue.
2. deploySessionCluster success, but we have problem with deployment to spawn a 
ready pod due to the resource or schedule problem or webhook intercept of 
kubernetes 

We can simply to try-catch the deploySessionCluster method block to solve the 
case 1 which have been done in my PR. 

But I still have some concern about the case2. I think there there should be a 
deadline to spawn a cluster , the related resource should be destroy after 
timeout. 

> Add clean up phase when kubernetes session start failed
> -------------------------------------------------------
>
>                 Key: FLINK-24624
>                 URL: https://issues.apache.org/jira/browse/FLINK-24624
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.14.0
>            Reporter: Aitozi
>            Priority: Major
>              Labels: pull-request-available
>
> Serval k8s resources are created when deploy the kubernetes session. But the 
> resource are left there when deploy failed. This will lead to the next 
> failure or resource leak. So I think we should add the clean up phase when 
> start failed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to