Re: Review Request 51929: Scheduling multiple tasks per round.

Stephan Erb Tue, 20 Sep 2016 12:04:27 -0700


> On Sept. 16, 2016, 11:08 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java, line 
> > 197
> > <https://reviews.apache.org/r/51929/diff/1/?file=1499323#file1499323line197>
> >
> >     Side show: Isn't that `if` unnecessary here and we can adjust the 
> > penality in any case? We will remove the group if `hasMore()` returns 
> > false, so any penality should be fine.
> 
> Maxim Khutornenko wrote:
>     Not sure I follow. This is the place that applies penalty accrued inside 
> the `startGroup()` or removes the group if it's empty.


Let me print some more code to make clearer what I meant.

This is the code where we compute penaltyMs depending on `group.hasMore`
```
            scheduledTaskPenalties.accumulate(group.getPenaltyMs());
            group.remove(scheduled);
            if (group.hasMore()) {
              penaltyMs = firstScheduleDelay;
            }
          }
        }

        group.setPenaltyMs(penaltyMs);
        evaluateGroupLater(this, group);
```


Later on we then drop empty groups in `evaluateGroupLater`:
```
  private synchronized void evaluateGroupLater(Runnable evaluate, TaskGroup 
group) {
    // Avoid check-then-act by holding the intrinsic lock.  If not done 
atomically, we could
    // remove a group while a task is being added to it.
    if (group.hasMore()) {
      executor.execute(evaluate, Amount.of(group.getPenaltyMs(), 
Time.MILLISECONDS));
    } else {
      groups.remove(group.getKey());
    }
  }
```

What I tried to say is that we could unconditionally write `penaltyMs = 
firstScheduleDelay;` without the `if (group.hasMore())` in the first snippet. 
If `group.hasMore()` returns false we will remove the group anyway, so it does 
not matter if we set a new penality or not.

This is completely unreleated to the change in this RB so feel free to ignore 
it. It was more about checking my own understanding of the code.


> On Sept. 16, 2016, 11:08 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java, 
> > lines 208-213
> > <https://reviews.apache.org/r/51929/diff/1/?file=1499324#file1499324line208>
> >
> >     I would have expected that we only request preemption if we failed to 
> > schedule all tasks in the current group.  If I remember correctly, 
> > preemption slot search only happens on a per-group basis anyway.
> >     
> >     You have probably thought about this, so I would like to understand 
> > your reasoning.
> 
> Maxim Khutornenko wrote:
>     We still want to attempt a preemption if _some_ but not _all_ tasks are 
> scheduled within a given round, right? Otherwise, preemption becomes an 
> all-or-nothing feature and we have to wait for another scheduling cycle to 
> request a reservation.

Ok I see.


> On Sept. 16, 2016, 11:08 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java, line 174
> > <https://reviews.apache.org/r/51929/diff/1/?file=1499325#file1499325line174>
> >
> >     Shouldn't that only happen if `launchTask` has succeeded?
> 
> Maxim Khutornenko wrote:
>     I've debated that as well and decided it's more logical to finish 
> accessing offer details before it's being launched with and removed from the 
> `OfferManager`.
>     
>     BTW, I just realized I was missing a `break` statement to bail out from 
> the scheduling round if a `LaunchException` is thrown. While theoretically we 
> _could_ continue matching even after the `LaunchException`, it's hard to 
> reason about the state of the storage and the last assigned item and as such 
> it's safer to terminate than continue. Fixed and added a test case. This 
> should now be resolved. Thanks for asking :)

Thanks for the explanation. Makes sense to me.


- Stephan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149179
-----------------------------------------------------------


On Sept. 16, 2016, 11:53 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> -----------------------------------------------------------
> 
> (Updated Sept. 16, 2016, 11:53 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark                                                                    
> Mode  Cnt      Score     Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 Â± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark                                                                    
> Mode  Cnt      Score     Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 Â± 187.890  ops/s
> ```
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 6f1cbfbc4510a037cffc95fee54f62f463d2b534 
>   src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
> 87b9e1928ab2d44668df1123f32ffdc4197c0c70 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 664bc6cf964ede2473a4463e58bcdbcb65bc7413 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
> 5d319557057e27fd5fc6d3e553e9ca9139399c50 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> d390c07522d22e43d79ce4370985f3643ef021ca 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 207d38d1ddfd373892602218a98c1daaf4a1325f 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
>   
> src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
>  ece476b918e6f2c128039e561eea23a94d8ed396 
>   
> src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
> 209f9298a1d55207b9b41159f2ab366f92c1eb70 
>   
> src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java
>  0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 
>   src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
> c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
>  ee5c6528af89cc62a35fdb314358c489556d8131 
>   src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
> 98048fabc00f233925b6cca015c2525980556e2b 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
> 2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 
>   src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
> 88729626de5fa87b45472792c59cc0ff1ade3e93 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java
>  a4e87d2216401f344dca64d69b945de7bcf8159a 
>   src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
> b4d27f69ad5d4cce03da9f04424dc35d30e8af29 
> 
> Diff: https://reviews.apache.org/r/51929/diff/
> 
> 
> Testing
> -------
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>

Re: Review Request 51929: Scheduling multiple tasks per round.

Reply via email to