[ 
https://issues.apache.org/jira/browse/MESOS-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad updated MESOS-3202:
-------------------------------
    Description: 
We currently run into issues with the DRF scheduler that frameworks do not 
receive offers (see https://github.com/mesosphere/marathon/issues/1931 for 
details). 

Imagine that we have 10 frameworks and unallocated resources from a single 
slave.
Allocation interval is 1 sec, and refuse_seconds (i.e. the time for which a 
declined resource is filtered) is 3 sec across all frameworks. 
Allocator offers resources to framework 1 (according to DRF) which declines the 
offer immediately. 
In the next allocation interval framework 1 is skipped due to the declined 
offer before. Hence the next framework 2 is offered the resources, which it 
also declines.
The same procedure in the next allocation interval (with framework 3). 

In the next allocation interval the refuse_seconds for framework 1 are over, 
and as it still has the lowest DRF share it gets the resource offered again, 
which it again declines. And the cycle begins again....

Framework 4 (which is actually waiting for this resource) is never offered this 
resource.


 

  was:We currently run into issues with the DRF scheduler that frameworks do 
not receive offers (see https://github.com/mesosphere/marathon/issues/1931 for 
details). 


> Avoid frameworks starving in DRF allocator.
> -------------------------------------------
>
>                 Key: MESOS-3202
>                 URL: https://issues.apache.org/jira/browse/MESOS-3202
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Joerg Schad
>
> We currently run into issues with the DRF scheduler that frameworks do not 
> receive offers (see https://github.com/mesosphere/marathon/issues/1931 for 
> details). 
> Imagine that we have 10 frameworks and unallocated resources from a single 
> slave.
> Allocation interval is 1 sec, and refuse_seconds (i.e. the time for which a 
> declined resource is filtered) is 3 sec across all frameworks. 
> Allocator offers resources to framework 1 (according to DRF) which declines 
> the offer immediately. 
> In the next allocation interval framework 1 is skipped due to the declined 
> offer before. Hence the next framework 2 is offered the resources, which it 
> also declines.
> The same procedure in the next allocation interval (with framework 3). 
> In the next allocation interval the refuse_seconds for framework 1 are over, 
> and as it still has the lowest DRF share it gets the resource offered again, 
> which it again declines. And the cycle begins again....
> Framework 4 (which is actually waiting for this resource) is never offered 
> this resource.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to