[jira] [Commented] (FLINK-27868) Harden running job check before triggering savepoints or savepoint upgrades

Gyula Fora (Jira) Mon, 06 Jun 2022 03:18:04 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-27868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550394#comment-17550394
 ]


Gyula Fora commented on FLINK-27868:
------------------------------------

I think the best would be to improve the JobStatus observer with the following 
logic:

if RUNNING status was observer we should apply a second check to verify that 
all tasks are indeed running.
If Yes, keep the job in RUNNING otherwise set the state to CREATED. This way we 
can leverage the improved running observation throughout the operator code 
where we already use it instead of having to inject custom logic all over the 
place.

> Harden running job check before triggering savepoints or savepoint upgrades
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-27868
>                 URL: https://issues.apache.org/jira/browse/FLINK-27868
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kubernetes Operator
>            Reporter: Gyula Fora
>            Assignee: Matyas Orhidi
>            Priority: Major
>             Fix For: kubernetes-operator-1.1.0
>
>
> Even if the job is in RUNNING state, often not all subtasks are yet running 
> which leads to savepoint upgrade / savepoint trigger failures. We should 
> harden the isRunning check we use to include subtask states as well.
> This suggestion is desribed more in detail by [~matyas] here: 
> https://github.com/apache/flink-kubernetes-operator/pull/237#issuecomment-1137054088



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-27868) Harden running job check before triggering savepoints or savepoint upgrades

Reply via email to