[ https://issues.apache.org/jira/browse/MESOS-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219182#comment-15219182 ]
Anand Mazumdar commented on MESOS-5067: --------------------------------------- Are you running docker swarm as a mesos framework? If so, why don't you set a higher failover timeout value for it i.e. like a week? https://github.com/apache/mesos/blob/master/include/mesos/v1/mesos.proto#L223 If so, even if your swarm framework instance crashes it can reconnect with Mesos using the same {{FrameworkId}}. All of it's tasks would still be intact. If you want to kill all the tasks you can explicitly teardown the framework. > Killing a framework does not kill framework tasks > ------------------------------------------------- > > Key: MESOS-5067 > URL: https://issues.apache.org/jira/browse/MESOS-5067 > Project: Mesos > Issue Type: Wish > Reporter: Guillermo Rodriguez > > By default, when a framework is terminated, mesos-master terminates all child > tasks for that framework. > There are some cases when I might like to stop a framework but not kill the > tasks of the framework. > In my particular case. I have Docker Swarm running, Swarm allows me to send > number crunching jobs to the cluster and they can run for hours. > The problem is that Swarm is also quite flaky and can crash anytime. If that > happens then all jobs are terminated and all the processing time is lost. > So, I would like to be able to set some flag for a framework where I tell > mesos master that the jobs started by the framework should be considered > separate of the framework itself so that the framework can be restarted and > jobs will keep running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)