[ https://issues.apache.org/jira/browse/MESOS-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Benjamin Mahler updated MESOS-295: ---------------------------------- Fix Version/s: (was: 0.16.0) 0.17.0 > Allow new masters to have better understanding of cluster state > --------------------------------------------------------------- > > Key: MESOS-295 > URL: https://issues.apache.org/jira/browse/MESOS-295 > Project: Mesos > Issue Type: Improvement > Reporter: Joe Smith > Assignee: Benjamin Mahler > Priority: Critical > Labels: twitter > Fix For: 0.17.0 > > > If a new master becomes elected, it will only have knowledge of the current > state of the cluster. This can lead to a situation where tasks become lost > but aren't properly killed. For instance: > 1) A set of machines (perhaps a datacenter rack) lose network connectivity > and their tasks are marked LOST by the master. However, they're still running. > 2) Through a potentially unrelated situation, there is a master failover to a > new master > 3) The network connection to the machines come back up > 4) These slaves never killed their tasks (and they shouldn't if they can't > talk to a master) > 5) Tasks stay running and aren't killed, taking up resources and running > outside the scope of the new master -- This message was sent by Atlassian JIRA (v6.1#6144)