[ https://issues.apache.org/jira/browse/MESOS-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956346#comment-16956346 ]
Andrei Sekretenko commented on MESOS-10015: ------------------------------------------- https://issues.apache.org/jira/browse/MESOS-9942 and related work will fix the `total number of frameworks` part. To fix the quadratic growth vs the reservations count, we can avoid using `Resources::operator +=`, `Resources::operator-=` and `Resources::contains()` for re-adding a slave to a framework sorter. > HierarchicalAllocatorProcess::updateAvailable() can stall the allocator with > a huge number of reservations on an agent. > ----------------------------------------------------------------------------------------------------------------------- > > Key: MESOS-10015 > URL: https://issues.apache.org/jira/browse/MESOS-10015 > Project: Mesos > Issue Type: Bug > Affects Versions: 1.5.3, 1.6.2, 1.7.2, 1.8.1, 1.9.0 > Reporter: Andrei Sekretenko > Assignee: Andrei Sekretenko > Priority: Critical > Labels: resource-management > > Currently, updateAvailable() called for a single-object Resources for a > single framework on a single slave requires `(total number of frameworks) * > (number of resource objects per this slave)^2` calls of `Resource::addable()` > In a cluster with a large number of frameworks this results in severe > degradation of allocator performance when a bunch of RESERVE/UNRESERVE > operations occurs for an agent with hundreds of unique resources. > On our testing cluster task we observed task scheduling delays up to 30 > minutes due to allocator being occupied with processing UNRESERVE operations. -- This message was sent by Atlassian Jira (v8.3.4#803005)