> On Sept. 16, 2016, 9:08 a.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java, line > > 197 > > <https://reviews.apache.org/r/51929/diff/1/?file=1499323#file1499323line197> > > > > Side show: Isn't that `if` unnecessary here and we can adjust the > > penality in any case? We will remove the group if `hasMore()` returns > > false, so any penality should be fine. > > Maxim Khutornenko wrote: > Not sure I follow. This is the place that applies penalty accrued inside > the `startGroup()` or removes the group if it's empty. > > Stephan Erb wrote: > Let me print some more code to make clearer what I meant. > > This is the code where we compute penaltyMs depending on `group.hasMore` > ``` > scheduledTaskPenalties.accumulate(group.getPenaltyMs()); > group.remove(scheduled); > if (group.hasMore()) { > penaltyMs = firstScheduleDelay; > } > } > } > > group.setPenaltyMs(penaltyMs); > evaluateGroupLater(this, group); > ``` > > > Later on we then drop empty groups in `evaluateGroupLater`: > ``` > private synchronized void evaluateGroupLater(Runnable evaluate, > TaskGroup group) { > // Avoid check-then-act by holding the intrinsic lock. If not done > atomically, we could > // remove a group while a task is being added to it. > if (group.hasMore()) { > executor.execute(evaluate, Amount.of(group.getPenaltyMs(), > Time.MILLISECONDS)); > } else { > groups.remove(group.getKey()); > } > } > ``` > > What I tried to say is that we could unconditionally write `penaltyMs = > firstScheduleDelay;` without the `if (group.hasMore())` in the first snippet. > If `group.hasMore()` returns false we will remove the group anyway, so it > does not matter if we set a new penality or not. > > This is completely unreleated to the change in this RB so feel free to > ignore it. It was more about checking my own understanding of the code.
I see what you meant now. The slight problem with the simplification you propose is that it _may_ result in a penalty where it would not happen today: `startGroup()` calculates the penalty even though there are no more groups left but the call to `evaluateGroupLater()` is delayed for some reason AND a new task is added into the group thus recharging it. The penalty would be applied the moment `evaluateGroupLater()` is finally reached. It's certainly an edge case but I'd prefer not changing the current behavior in the this patch. - Maxim ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51929/#review149179 ----------------------------------------------------------- On Sept. 16, 2016, 9:53 p.m., Maxim Khutornenko wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51929/ > ----------------------------------------------------------- > > (Updated Sept. 16, 2016, 9:53 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > ------- > > This is phase 2 of scheduling perf improvement effort started in > https://reviews.apache.org/r/51759/. > > We can now take multiple (configurable) number of task IDs from a given > `TaskGroup` per scheduling. The idea is to go deeper through the offer queue > and assign more than one task if possible. This approach delivers > substantially better MTTA and still ensures fairness across multiple > `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 > tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be > set even higher if the majority of cluster jobs have large number of > instances and/or update batch sizes. > > As far as a single round perf goes, we can consider the following 2 > worst-case scenarios: > - master: single task scheduling fails after trying all offers in the queue > - this patch: N tasks launched with the very last N offers in the queue + `(N > x single_task_launch_latency)` > > Assuming that matching N tasks against M offers takes exactly the same time > as 1 task against M offers (as they all share the same `TaskGroup`), the only > measurable difference comes from the additional `N x > single_task_launch_latency` overhead. Based on real cluster observations, the > `single_task_launch_latency` is less than 1% of a single task scheduling > attempt, which is << than the savings from avoided additional scheduling > rounds. > > As far as jmh results go, the new approach (batching + multiple tasks per > round) is only slightly more demanding (~8%). Both results though are MUCH > higher than the real cluster perf, which just confirms we are not bound by > CPU time here: > > Master: > ``` > Benchmark > Mode Cnt Score Error Units > SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark > thrpt 10 17126.183 ± 488.425 ops/s > ``` > > This patch: > ``` > Benchmark > Mode Cnt Score Error Units > SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark > thrpt 10 15838.051 ± 187.890 ops/s > ``` > > > Diffs > ----- > > src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java > 6f1cbfbc4510a037cffc95fee54f62f463d2b534 > src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java > 87b9e1928ab2d44668df1123f32ffdc4197c0c70 > src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java > 664bc6cf964ede2473a4463e58bcdbcb65bc7413 > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java > 5d319557057e27fd5fc6d3e553e9ca9139399c50 > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java > d390c07522d22e43d79ce4370985f3643ef021ca > src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java > 207d38d1ddfd373892602218a98c1daaf4a1325f > src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java > 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 > > src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java > ece476b918e6f2c128039e561eea23a94d8ed396 > > src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java > 209f9298a1d55207b9b41159f2ab366f92c1eb70 > > src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java > 0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 > src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java > c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 > > src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java > ee5c6528af89cc62a35fdb314358c489556d8131 > src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java > 98048fabc00f233925b6cca015c2525980556e2b > > src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java > 2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 > src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java > 88729626de5fa87b45472792c59cc0ff1ade3e93 > > src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java > a4e87d2216401f344dca64d69b945de7bcf8159a > src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java > b4d27f69ad5d4cce03da9f04424dc35d30e8af29 > > Diff: https://reviews.apache.org/r/51929/diff/ > > > Testing > ------- > > All types of testing including deploying to test and production clusters. > > > Thanks, > > Maxim Khutornenko > >