[ https://issues.apache.org/jira/browse/MESOS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114745#comment-16114745 ]
Yan Xu commented on MESOS-7215: ------------------------------- Communicated over slack but yeah it's being worked on and a patch will be ready soon. > Race condition on re-registration of non-partition-aware frameworks > ------------------------------------------------------------------- > > Key: MESOS-7215 > URL: https://issues.apache.org/jira/browse/MESOS-7215 > Project: Mesos > Issue Type: Bug > Affects Versions: 1.2.0 > Reporter: Yan Xu > Assignee: Megha Sharma > Priority: Critical > > Prior to the partition-awareness work MESOS-5344, upon agent reregistration > after it has been removed, the master only sends ShutdownFrameworkMessages to > the agent for frameworks that it knows have been torn down. > With the new logic in MESOS-5344, Mesos is now sending > {{ShutdownFrameworkMessages}} to the agent for all non-partition-aware > frameworks (including the ones that are still registered) > This is problematic. The offer from this agent can still go to the same > framework which can then launch new tasks. The agent then receives tasks of > the same framework and ignores them because it thinks the framework is > shutting down. The framework is not shutting down of course, so from the > master and the scheduler's perspective the task is pending in STAGING forever > until the next agent reregistration, which could happen much later. > This also makes the semantics of `ShutdownFrameworkMessage` ambiguous: the > agent is assuming the framework to be going away (and act accordingly) when > it's not. -- This message was sent by Atlassian JIRA (v6.4.14#64029)