[ 
https://issues.apache.org/jira/browse/MESOS-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-4315:
---------------------------------------
    Summary: Improve quota failover logic.  (was: Improve Quota Failover Logic)

> Improve quota failover logic.
> -----------------------------
>
>                 Key: MESOS-4315
>                 URL: https://issues.apache.org/jira/browse/MESOS-4315
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Joerg Schad
>
> The Quota failover logic introduced with MESOS-3865 changes the master 
> failover recovery significantly if at least one quota is set. 
> Now, if upon recovery any previously set quota has been detected, the 
> allocator enters recovery mode, during which the allocator does not issue 
> offers. The recovery mode — and therefore offer suspension — ends when either:
> * a certain amount of agents reregisters (by default 80% of agents known   
> before the failover),
> * a timeout expires (by default 10 minutes).
> We could also safely exit the recovery mode, once all quotas have been 
> satisfied (i.e. all agents participating in satisfying quota have 
> reconnected). For small clusters a large percentage of quota'ed resources 
> this will not make too much difference compared to the existing rules. But 
> for larger clusters this condition could be fulfilled much faster than the 
> 80% condition. 
> We should at least consider whether such condition is worth the added 
> complexity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to