Re: Review Request 51929: Scheduling multiple tasks per round.

Maxim Khutornenko Tue, 20 Sep 2016 15:02:03 -0700


> On Sept. 16, 2016, 9:08 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java, line 
> > 197
> > <https://reviews.apache.org/r/51929/diff/1/?file=1499323#file1499323line197>
> >
> >     Side show: Isn't that `if` unnecessary here and we can adjust the 
> > penality in any case? We will remove the group if `hasMore()` returns 
> > false, so any penality should be fine.
> 
> Maxim Khutornenko wrote:
>     Not sure I follow. This is the place that applies penalty accrued inside 
> the `startGroup()` or removes the group if it's empty.
> 
> Stephan Erb wrote:
>     Let me print some more code to make clearer what I meant.
>     
>     This is the code where we compute penaltyMs depending on `group.hasMore`
>     ```
>                 scheduledTaskPenalties.accumulate(group.getPenaltyMs());
>                 group.remove(scheduled);
>                 if (group.hasMore()) {
>                   penaltyMs = firstScheduleDelay;
>                 }
>               }
>             }
>     
>             group.setPenaltyMs(penaltyMs);
>             evaluateGroupLater(this, group);
>     ```
>     
>     
>     Later on we then drop empty groups in `evaluateGroupLater`:
>     ```
>       private synchronized void evaluateGroupLater(Runnable evaluate, 
> TaskGroup group) {
>         // Avoid check-then-act by holding the intrinsic lock.  If not done 
> atomically, we could
>         // remove a group while a task is being added to it.
>         if (group.hasMore()) {
>           executor.execute(evaluate, Amount.of(group.getPenaltyMs(), 
> Time.MILLISECONDS));
>         } else {
>           groups.remove(group.getKey());
>         }
>       }
>     ```
>     
>     What I tried to say is that we could unconditionally write `penaltyMs = 
> firstScheduleDelay;` without the `if (group.hasMore())` in the first snippet. 
> If `group.hasMore()` returns false we will remove the group anyway, so it 
> does not matter if we set a new penality or not.
>     
>     This is completely unreleated to the change in this RB so feel free to 
> ignore it. It was more about checking my own understanding of the code.


I see what you meant now. The slight problem with the simplification you 
propose is that it _may_ result in a penalty where it would not happen today: 
`startGroup()` calculates the penalty even though there are no more groups left 
but the call to `evaluateGroupLater()` is delayed for some reason AND a new 
task is added into the group thus recharging it. The penalty would be applied 
the moment `evaluateGroupLater()` is finally reached. It's certainly an edge 
case but I'd prefer not changing the current behavior in the this patch.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149179
-----------------------------------------------------------


On Sept. 16, 2016, 9:53 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> -----------------------------------------------------------
> 
> (Updated Sept. 16, 2016, 9:53 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark                                                                    
> Mode  Cnt      Score     Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 Â± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark                                                                    
> Mode  Cnt      Score     Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 Â± 187.890  ops/s
> ```
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 6f1cbfbc4510a037cffc95fee54f62f463d2b534 
>   src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
> 87b9e1928ab2d44668df1123f32ffdc4197c0c70 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 664bc6cf964ede2473a4463e58bcdbcb65bc7413 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
> 5d319557057e27fd5fc6d3e553e9ca9139399c50 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> d390c07522d22e43d79ce4370985f3643ef021ca 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 207d38d1ddfd373892602218a98c1daaf4a1325f 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
>   
> src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
>  ece476b918e6f2c128039e561eea23a94d8ed396 
>   
> src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
> 209f9298a1d55207b9b41159f2ab366f92c1eb70 
>   
> src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java
>  0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 
>   src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
> c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
>  ee5c6528af89cc62a35fdb314358c489556d8131 
>   src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
> 98048fabc00f233925b6cca015c2525980556e2b 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
> 2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 
>   src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
> 88729626de5fa87b45472792c59cc0ff1ade3e93 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java
>  a4e87d2216401f344dca64d69b945de7bcf8159a 
>   src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
> b4d27f69ad5d4cce03da9f04424dc35d30e8af29 
> 
> Diff: https://reviews.apache.org/r/51929/diff/
> 
> 
> Testing
> -------
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>

Re: Review Request 51929: Scheduling multiple tasks per round.

Reply via email to