[ https://issues.apache.org/jira/browse/FLINK-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15263812#comment-15263812 ]
Ufuk Celebi commented on FLINK-3800: ------------------------------------ We had to revert this in 0708dd0 for release-1.0 after a discussion with Till. The problem is that JobGraphs are lost when the job reaches a final state, after which it will be removed from ZooKeeper. If they stay orphans though, this can lead to races, where the orphan and re-deployment after leadership compete for the same resources (as reported by a user). > ExecutionGraphs can become orphans > ---------------------------------- > > Key: FLINK-3800 > URL: https://issues.apache.org/jira/browse/FLINK-3800 > Project: Flink > Issue Type: Bug > Components: JobManager > Affects Versions: 1.0.0, 1.1.0 > Reporter: Till Rohrmann > Assignee: Till Rohrmann > > The {{JobManager.cancelAndClearEverything}} method fails all currently > executed jobs on the {{JobManager}} and then clears the list of > {{currentJobs}} kept in the JobManager. This can become problematic if the > user has set a restart strategy for a job, because the {{RestartStrategy}} > will try to restart the job. This can lead to unwanted re-deployments of the > job which consumes resources and thus will trouble the execution of other > jobs. If the restart strategy never stops, then this prevents that the > {{ExecutionGraph}} from ever being properly terminated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)