Github user tillrohrmann commented on the pull request:

    https://github.com/apache/flink/pull/1153#issuecomment-145798045
  
    What happens if the last JM dies and with it the currently executed job 
fails permanently. The in ZooKeeper stored `JobGraph` will then be recovered 
when a new Flink cluster is started, right? Does this make sense? Is there a 
way to get rid of terminally failed jobs?
    
    The problem is that otherwise the recovered job won't find the submitting 
`JobClient` and occupies cluster resources (slots). Thus, you start a new 
cluster and want to submit a job and it fails because an old recovered job 
occupies the slots. But maybe I overlooked the mechanism to avoid this scenario.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to