[ https://issues.apache.org/jira/browse/MESOS-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yong Qiao Wang updated MESOS-3403: ---------------------------------- Shepherd: Vinod Kone > Add support for removing no re-registered slaves with > timeout(--slave_reregister_timeout) from an external allocator > -------------------------------------------------------------------------------------------------------------------- > > Key: MESOS-3403 > URL: https://issues.apache.org/jira/browse/MESOS-3403 > Project: Mesos > Issue Type: Improvement > Components: master > Reporter: Yong Qiao Wang > Assignee: Yong Qiao Wang > > For an external Mesos allocator which does not run with Mesos master in the > same OS process, and maybe this allocator can be deployed in the different > host with Mesos master, then the Mesos allocator module should be implemented > as a proxy, which delegates calls to an actual allocator. > For this external allocator, the total resources and allocated resources will > be stored in it. After Mesos master recovery (such as fail-over), it needs to > sync up with Mesos master. Under normal circumstances, all slaves will > reregister after Mesos master recovery, so we can sync up the total resources > and used resource of each slave in allocator->addSlave function call. But for > the abnormal case, a slave does not reregister after Mesos master recovery, > then master will call function Master::removeSlave(const Registry::Slave& > slave) to remove this slave from Registry after > timeout(slave_reregister_timeout), but this function does not call allocator > to remove the related resources. So in order to support the resources sync up > with the external allocator in this abnormal case, it needs to enhance > function Master::removeSlave(const Registry::Slave& slave) to call > allocator->removeSlave to remove the related resources from external > allocator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)