[ 
https://issues.apache.org/jira/browse/FLINK-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15327372#comment-15327372
 ] 

ASF GitHub Bot commented on FLINK-3800:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/2096

    [FLINK-3800] [runtime] Introduce SUSPENDED job status

    The SUSPENDED job status is a new ExecutionGraph state which can be reached 
from all
    non-terminal states when calling suspend on the ExecutionGraph. Unlike the 
FAILED,
    FINISHED and CANCELED state, the SUSPENDED state does not trigger the 
deletion of the
    job from the HA storage. Therefore, this state can be used to handle the 
loss of
    leadership or the shutdown of a JobManager so that the ExecutionGraph is 
stopped but
    can still be recovered. SUSPENDED is also a terminal state but it can be 
differentiated as
    a locally terminal state from FAILED, CANCELED and FINISHED which are 
globally
    terminal states.
    
    Add test case for suspend signal
    
    Add test case for suspending restarting job
    
    Add test case for HA job recovery when losing leadership
    
    Add online documentation for the job status

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink fixHALifecycle

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2096.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2096
    
----
commit 0d3c738e85fed2e161bc724887a4d8ce06a2798c
Author: Till Rohrmann <trohrm...@apache.org>
Date:   2016-06-09T09:37:14Z

    [FLINK-3800] [runtime] Introduce SUSPENDED job status
    
    The SUSPENDED job status is a new ExecutionGraph state which can be reached 
from all
    non-terminal states when calling suspend on the ExecutionGraph. Unlike the 
FAILED,
    FINISHED and CANCELED state, the SUSPENDED state does not trigger the 
deletion of the
    job from the HA storage. Therefore, this state can be used to handle the 
loss of
    leadership or the shutdown of a JobManager so that the ExecutionGraph is 
stopped but
    can still be recovered. SUSPENDED is also a terminal state but it can be 
differentiated as
    a locally terminal state from FAILED, CANCELED and FINISHED which are 
globally
    terminal states.
    
    Add test case for suspend signal
    
    Add test case for suspending restarting job
    
    Add test case for HA job recovery when losing leadership
    
    Add online documentation for the job status

----


> ExecutionGraphs can become orphans
> ----------------------------------
>
>                 Key: FLINK-3800
>                 URL: https://issues.apache.org/jira/browse/FLINK-3800
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager
>    Affects Versions: 1.0.0, 1.1.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>
> The {{JobManager.cancelAndClearEverything}} method fails all currently 
> executed jobs on the {{JobManager}} and then clears the list of 
> {{currentJobs}} kept in the JobManager. This can become problematic if the 
> user has set a restart strategy for a job, because the {{RestartStrategy}} 
> will try to restart the job. This can lead to unwanted re-deployments of the 
> job which consumes resources and thus will trouble the execution of other 
> jobs. If the restart strategy never stops, then this prevents that the 
> {{ExecutionGraph}} from ever being properly terminated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to