[ 
https://issues.apache.org/jira/browse/MESOS-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219182#comment-15219182
 ] 

Anand Mazumdar commented on MESOS-5067:
---------------------------------------

Are you running docker swarm as a mesos framework? If so, why don't you set a 
higher failover timeout value for it i.e. like a week? 
https://github.com/apache/mesos/blob/master/include/mesos/v1/mesos.proto#L223

If so, even if your swarm framework instance crashes it can reconnect with 
Mesos using the same {{FrameworkId}}. All of it's tasks would still be intact. 
If you want to kill all the tasks you can explicitly teardown the framework.

> Killing a framework does not kill framework tasks
> -------------------------------------------------
>
>                 Key: MESOS-5067
>                 URL: https://issues.apache.org/jira/browse/MESOS-5067
>             Project: Mesos
>          Issue Type: Wish
>            Reporter: Guillermo Rodriguez
>
> By default, when a framework is terminated, mesos-master terminates all child 
> tasks for that framework.
> There are some cases when I might like to stop a framework but not kill the 
> tasks of the framework. 
> In my particular case. I have Docker Swarm running, Swarm allows me to send 
> number crunching jobs to the cluster and they can run for hours.
> The problem is that Swarm is also quite flaky and can crash anytime. If that 
> happens then all jobs are terminated and all the processing time is lost.
> So, I would like to be able to set some flag for a framework where I tell 
> mesos master that the jobs started by the framework should be considered 
> separate of the framework itself so that the framework can be restarted and 
> jobs will keep running. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to