[jira] [Commented] (FLINK-37567) Flink clusters not be clean up when using job cancellation as suspend mechanism

Gyula Fora (Jira) Fri, 28 Mar 2025 08:16:25 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-37567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17939250#comment-17939250
 ]


Gyula Fora commented on FLINK-37567:
------------------------------------

You are right regarding the parallelism / slots but this doesn't always work if 
parallelism is overridden programmatically for example.

The intention for keeping the JM around is to allow users to observe the job, 
checkpoint state etc if they want. There is a timeout after which it's 
terminated. You can set this timeout to 0 to not keep it around. The JM 
generally doesn't consume too much resources so it is usually fine.

I can see the value of a standalone model but a huge downside is that it cannot 
easily integrate with active resource management for rescaling, and adaptive 
job / tm scheduling. For most users there is also no need for such advanced 
customization, even in custom kubernetes envs based on what I have seen in the 
last 3 years almost everyone still uses the native integration. 

> Flink clusters not be clean up when using job cancellation as suspend 
> mechanism
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-37567
>                 URL: https://issues.apache.org/jira/browse/FLINK-37567
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.10.0, kubernetes-operator-1.11.0
>            Reporter: Alan Zhang
>            Priority: Major
>
> In general, for application mode, the Flink cluster lifecycle should be tight 
> with the Flink job lifecycle, which means we should delete the Flink cluster 
> if the job stopped.
> However, I noticed that Flink clusters are not deleted when I tried to 
> suspend FlinkDeployment with "job-cancel" enabled. The CR shows the job under 
> "CANCELED" state, but the underlying Flink cluster is still running.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-37567) Flink clusters not be clean up when using job cancellation as suspend mechanism

Reply via email to