[ 
https://issues.apache.org/jira/browse/MESOS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13741183#comment-13741183
 ] 

Thomas Marshall commented on MESOS-621:
---------------------------------------

To be a little more explicit about what's going on here, the allocator tracks 
the current allocations in two ways: it tracks the amount of resources 
allocated to each framework (via the sorters), and it tracks the amount of 
resources allocated from each slave (via the 'available' variable in the Slave 
struct).

When a framework is removed, we update the information that we know how to at 
the moment (its allocation in the sorter), but we currently have no way of 
knowing which slave those resources came from in order to update the resources 
allocated from the slaves.

Similarly, when a slave is removed, we update the information that we know how 
to at the moment (the slave's available resources, which become zero via 
removing the slave from the slaves map), but again we currently have no way of 
knowing which framework the resources that are still allocated on this slave 
were allocated to.

So, Allocator::slaveAdded and Allocator::slaveRemoved are just as complimentary 
as Allocator::frameworkAdded and Allocator::frameworkRemoved. The calls to 
Allocator::resourcesRecovered, such as those in Master::offer, are necessary to 
give the allocator enough info to update the things we didn't know about before 
(if the framework has been removed, resourcesRecovered updates the slave's 
resources, if the slave has been removed it updates the framework's resources).

Now, we could do all of the resource updating during 
Allocator::frameworkRemoved and Allocator::slaveRemoved if we kept around more 
state (say, a map of frameworks to resources in the slave struct representing 
allocations on that slave, and a map of slaves to resources in the framework 
struct representing allocations to that framework).

Doing so won't prevent the spurious allocations from being made that the calls 
to Allocator::resourcesRecovered in Master::offer are addressing, but it would 
save a few calls to Allocator::resourcesRecovered at the cost of adding more 
state for the allocator to have to maintain (and also at the cost of slightly 
complicating the master, since relying on Master::removeTask when shutting down 
slaves defeats the point of all of this, since it dispatches 
Allocator::resourcesRecovered).

The other possible advantage is making all of this easier to reason about, 
which I think these changes would accomplish (the fact that this post 
explaining the way it currently works is so long tells me that its overly 
complicated, plus in the course of figuring all of this out, I discovered there 
is actually a bug - when a slave is removed, the resources for executors 
currently running on the slave are never returned via 
Allocator::resourcesRecovered, showing that the "any resources that get 
allocated must be returned by the master" rule isn't as easy to follow as it 
sounds), so I'll write up a patch to add the extra state, as discussed above, 
unless anyone feels differently.

Sorry this was such a long post.
                
> HierarchicalAllocator::slaveRemoved doesn't properly handle framework 
> allocations/resources
> -------------------------------------------------------------------------------------------
>
>                 Key: MESOS-621
>                 URL: https://issues.apache.org/jira/browse/MESOS-621
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Vinod Kone
>            Assignee: Thomas Marshall
>             Fix For: 0.14.0
>
>
> Currently a slaveRemoved() simply removes the slave from 'slaves' map and 
> slave's resources from 'roleSorter'. Looking at resourcesRecovered(), more 
> things need to be done when a slave is removed (e.g., framework 
> unallocations).
> It would be nice to fix this and have a test for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to