[ https://issues.apache.org/jira/browse/MESOS-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Rukletsov updated MESOS-4315: --------------------------------------- Summary: Improve quota failover logic. (was: Improve Quota Failover Logic) > Improve quota failover logic. > ----------------------------- > > Key: MESOS-4315 > URL: https://issues.apache.org/jira/browse/MESOS-4315 > Project: Mesos > Issue Type: Improvement > Reporter: Joerg Schad > > The Quota failover logic introduced with MESOS-3865 changes the master > failover recovery significantly if at least one quota is set. > Now, if upon recovery any previously set quota has been detected, the > allocator enters recovery mode, during which the allocator does not issue > offers. The recovery mode — and therefore offer suspension — ends when either: > * a certain amount of agents reregisters (by default 80% of agents known > before the failover), > * a timeout expires (by default 10 minutes). > We could also safely exit the recovery mode, once all quotas have been > satisfied (i.e. all agents participating in satisfying quota have > reconnected). For small clusters a large percentage of quota'ed resources > this will not make too much difference compared to the existing rules. But > for larger clusters this condition could be fulfilled much faster than the > 80% condition. > We should at least consider whether such condition is worth the added > complexity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)