[
https://issues.apache.org/jira/browse/MESOS-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956346#comment-16956346
]
Andrei Sekretenko commented on MESOS-10015:
-------------------------------------------
https://issues.apache.org/jira/browse/MESOS-9942 and related work will fix the
`total number of frameworks` part.
To fix the quadratic growth vs the reservations count, we can avoid using
`Resources::operator +=`, `Resources::operator-=` and `Resources::contains()`
for re-adding a slave to a framework sorter.
> HierarchicalAllocatorProcess::updateAvailable() can stall the allocator with
> a huge number of reservations on an agent.
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: MESOS-10015
> URL: https://issues.apache.org/jira/browse/MESOS-10015
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 1.5.3, 1.6.2, 1.7.2, 1.8.1, 1.9.0
> Reporter: Andrei Sekretenko
> Assignee: Andrei Sekretenko
> Priority: Critical
> Labels: resource-management
>
> Currently, updateAvailable() called for a single-object Resources for a
> single framework on a single slave requires `(total number of frameworks) *
> (number of resource objects per this slave)^2` calls of `Resource::addable()`
> In a cluster with a large number of frameworks this results in severe
> degradation of allocator performance when a bunch of RESERVE/UNRESERVE
> operations occurs for an agent with hundreds of unique resources.
> On our testing cluster task we observed task scheduling delays up to 30
> minutes due to allocator being occupied with processing UNRESERVE operations.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)