[ https://issues.apache.org/jira/browse/MESOS-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Benjamin Mahler updated MESOS-3157: ----------------------------------- Shepherd: Benjamin Mahler Having mulled over this patch, and the threads related to this, it seems like the issue here is that we perform an unnecessary number of full allocations (1 allocation : 1 event), whereas ideally we perform batching (M allocations : N events, where M <= N). For example, when N calls to reviveOffers are enqueued behind an allocation, we'll do the following: {noformat} allocate reviveOffers -> allocate reviveOffers -> allocate reviveOffers -> allocate reviveOffers -> allocate {noformat} When ideally we could do the following: {noformat} allocate reviveOffers reviveOffers reviveOffers reviveOffers allocate {noformat} The idea here is to ensure that allocation work that arrives while we were doing an allocation (in this case 3 reviveOffers) will be "batched" into a single allocation round. This technique is used in the registrar (registrar.cpp) in order to avoid the performance issues from excessive queueing that occur when operations are done serially without "batching". Here's how I would suggest proceeding: (1) Add an allocator benchmark for a large number of reviveOffer requests when there are many slaves and frameworks, which includes the time taken for the implied allocations to occur. (2) Implement batching of allocations, this will entail keeping a running set of SlaveIDs which require an allocation. Also, rather than immediately allocating during an event, we defer the allocation so that it will occur *after* all currently enqueued events. When the deferred allocation occurs, we clear the running set of SlaveIDs. Note that if an interval-based allocation occurs before the deferred allocation, it will also clear the running set, which is correct. (3) This should avoid the need for eliminating the event-driven allocation code as per the original intent of this patch, since we've bounded the amount of allocations that can be queued. [~jamespeach] sorry for the runaround! From what I've gathered from the emails and this ticket, this should be sufficient for keeping event-driven allocation without backing up the allocator in the case of expensive allocation. At the same time as this, we should invest effort in improving the performance of the allocation loop. > only perform batch resource allocations > --------------------------------------- > > Key: MESOS-3157 > URL: https://issues.apache.org/jira/browse/MESOS-3157 > Project: Mesos > Issue Type: Bug > Components: allocation > Reporter: James Peach > Assignee: James Peach > > Our deployment environments have a lot of churn, with many short-live > frameworks that often revive offers. Running the allocator takes a long time > (from seconds up to minutes). > In this situation, event-triggered allocation causes the event queue in the > allocator process to get very long, and the allocator effectively becomes > unresponsive (eg. a revive offers message takes too long to come to the head > of the queue). > We have been running a patch to remove all the event-triggered allocations > and only allocate from the batch task > {{HierarchicalAllocatorProcess::batch}}. This works great and really improves > responsiveness. -- This message was sent by Atlassian JIRA (v6.3.4#6332)