Andrei Sekretenko created MESOS-10015:
-----------------------------------------
Summary: HierarchicalAllocatorProcess::updateAvailable() can stall
the allocator with a huge number of reservations on an agent.
Key: MESOS-10015
URL: https://issues.apache.org/jira/browse/MESOS-10015
Project: Mesos
Issue Type: Bug
Affects Versions: 1.9.0, 1.8.1, 1.7.2, 1.6.2, 1.5.3
Reporter: Andrei Sekretenko
Assignee: Andrei Sekretenko
Currently, updateAvailable() called for a single-object Resources for a single
framework on a single slave requires `(total number of frameworks) * (number of
resource objects per this slave)^2` calls of `Resource::addable()`
In a cluster with a large number of frameworks this results in severe
degradation of allocator performance when a bunch of RESERVE/UNRESERVE
operations occurs for an agent with hundreds of unique resources.
On our testing cluster task we observed task scheduling delays up to 30 minutes
due to allocator being occupied with processing UNRESERVE operations.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)