[jira] [Commented] (MESOS-4694) DRFAllocator takes very long to allocate resources with a large number of frameworks

Benjamin Mahler (JIRA) Thu, 22 Sep 2016 19:49:45 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15515183#comment-15515183
 ]


Benjamin Mahler commented on MESOS-4694:
----------------------------------------

{noformat}
commit fba3108123442c78d4cee6047e2b1d64aab5a37a
Author: Dario Rexin <dre...@apple.com>
Date:   Thu Sep 22 16:00:03 2016 -0700

    Improve DRF sorter performance by bypassing `Resources`.

    Currently in `DRFSorter::calculateShare()` (which is called very
    frequently), the use of `Resources::get<Scalar>()` is expensive
    as it needs to loop over the `Resource` objects and do string
    comparison on the `Resource::name` strings.

    This patch avoids using `Resources::get<Scalar>()` in
    `DRFSorter::calaculateShare()` in favor of maintaining
    a map of resource names to scalars.

    Note that we had to maintain both this new map and the
    previously added `strippedScalarQuantities` since the
    latter stores resource roles.

    Review: https://reviews.apache.org/r/43665/
{noformat}

> DRFAllocator takes very long to allocate resources with a large number of 
> frameworks
> ------------------------------------------------------------------------------------
>
>                 Key: MESOS-4694
>                 URL: https://issues.apache.org/jira/browse/MESOS-4694
>             Project: Mesos
>          Issue Type: Improvement
>          Components: allocation
>    Affects Versions: 0.26.0, 0.27.0, 0.27.1, 0.27.2, 0.28.0, 0.28.1
>            Reporter: Dario Rexin
>            Assignee: Dario Rexin
>
> With a growing number of connected frameworks, the allocation time grows to 
> very high numbers. The addition of quota in 0.27 had an additional impact on 
> these numbers. Running `mesos-tests.sh --benchmark 
> --gtest_filter=HierarchicalAllocator_BENCHMARK_Test.DeclineOffers` gives us 
> the following numbers:
> {noformat}
> [==========] Running 1 test from 1 test case.
> [----------] Global test environment set-up.
> [----------] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN      ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.921202secs to make 200 offers
> round 1 allocate took 2.85045secs to make 200 offers
> round 2 allocate took 2.823768secs to make 200 offers
> {noformat}
> Increasing the number of frameworks to 2000:
> {noformat}
> [==========] Running 1 test from 1 test case.
> [----------] Global test environment set-up.
> [----------] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN      ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 2000 frameworks
> round 0 allocate took 28.209454secs to make 2000 offers
> round 1 allocate took 28.469419secs to make 2000 offers
> round 2 allocate took 28.138086secs to make 2000 offers
> {noformat}
> I was able to reduce this time by a substantial amount. After applying the 
> patches:
> {noformat}
> [==========] Running 1 test from 1 test case.
> [----------] Global test environment set-up.
> [----------] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN      ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 1.016226secs to make 2000 offers
> round 1 allocate took 1.102729secs to make 2000 offers
> round 2 allocate took 1.102624secs to make 2000 offers
> {noformat}
> And with 2000 frameworks:
> {noformat}
> [==========] Running 1 test from 1 test case.
> [----------] Global test environment set-up.
> [----------] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN      ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 2000 frameworks
> round 0 allocate took 12.563203secs to make 2000 offers
> round 1 allocate took 12.437517secs to make 2000 offers
> round 2 allocate took 12.470708secs to make 2000 offers
> {noformat}
> The patches do 3 things to improve the performance of the allocator.
> 1) The total values in the DRFSorter will be pre calculated per resource type
> 2) In the allocate method, when no resources are available to allocate, we 
> break out of the innermost loop to prevent looping over a large number of 
> frameworks when we have nothing to allocate
> 3) when a framework suppresses offers, we remove it from the sorter instead 
> of just calling continue in the allocation loop - this greatly improves 
> performance in the sorter and prevents looping over frameworks that don't 
> need resources
> Assuming that most of the frameworks behave nicely and suppress offers when 
> they have nothing to schedule, it is fair to assume, that point 3) has the 
> biggest impact on the performance. If we suppress offers for 90% of the 
> frameworks in the benchmark test, we see following numbers:
> {noformat}
> ==========] Running 1 test from 1 test case.
> [----------] Global test environment set-up.
> [----------] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN      ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 200 slaves and 2000 frameworks
> round 0 allocate took 11626us to make 200 offers
> round 1 allocate took 22890us to make 200 offers
> round 2 allocate took 21346us to make 200 offers
> {noformat}
> And for 200 frameworks:
> {noformat}
> [==========] Running 1 test from 1 test case.
> [----------] Global test environment set-up.
> [----------] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN      ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 2000 frameworks
> round 0 allocate took 1.11178secs to make 2000 offers
> round 1 allocate took 1.062649secs to make 2000 offers
> round 2 allocate took 1.080181secs to make 2000 offers
> {noformat}
> Review requests:
> https://reviews.apache.org/r/43665/
> https://reviews.apache.org/r/43666/
> https://reviews.apache.org/r/43668/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4694) DRFAllocator takes very long to allocate resources with a large number of frameworks

Reply via email to