> On Jan. 19, 2018, 8:27 p.m., Stephan Erb wrote: > > The patch itself looks fine! However, I have difficulties assesing it > > properly as I am missing a bit of context. What is the main customization > > that you aim to implement using the new interface? > > > > I am also curious what is your take on the removed offer sorting logic. > > Does it still work for Twitter as intended originally? We are currently > > using it as well, so I would need to re-implement if it gets removed. (Main > > issue we have noticed so far was that the number of tasks per agent can > > shift drastically, sometimes even up to a point where the Thermos observer > > is completely unusable). > > Jordan Ly wrote: > Some context for the new interface: > > For performance, it is helpful to control the data structure we are > holding offers in and how we can take advantage of it to narrow down the list > of candidates to schedule on. For example, in the case of CPU ordering, if we > call `getOrdered(TaskGroupKey, ResourceRequest)` -- since we know the size of > the ResourceRequest and since we have access to the `offers` NavigableSet, we > might use `subMap` to return only candidates that we know will fit thus > reducing the number of offers we have to evaluate right away. > > Stephan Erb wrote: > Yeah that makes sense, but would have also been possible without the set > of recent scheduling refactorings (either by using the `NavigableSet` as-is > or via something like > https://github.com/wfarner/aurora/commit/f77d79a2d01c3b5b34f11b812d5dcff2789e0766 > ). On top, this could even be done by default. > > If I am reading correctly between the lines of this RB here and > https://reviews.apache.org/r/64954/ it sounds like you are moving towards a > custom, heavy-weight scheduling logic. My question essentially boils down to: > Do you go down the custom route because your usecase is so specific? Or is it > just more involved doing it in the open (lack of reviews, increases > turnaround times, ...)? > > I don't intend to block your patch, as it looks good. I just want to > understand where this is heading and why. > > David McLaughlin wrote: > Twitter has a 85/15 split between production (preferred) and preemtible > tier tasks. The idea we're pursuing is to try and split the cluster logically > into hosts that have preferred tasks and hosts that have preemtible tasks. > Why is this useful? Well, the machines with preemptible tasks are effectively > "empty" with preemption enabled, in terms of available slots that other jobs > can grow into. It means if we commit 15% of our budget to the preemptible > tier, we can also guarantee that we can support that % of growth in the > cluster. This will enable larger instances, and temporary bursts, etc. > > Bin-packing alone doesn't solve this, because if you end up with every > host 85% prod and 15% non-prod, and you are low on capacity (forcing > preemption) only slots 15% the size of a host are available. This excludes > many jobs from growing, and it forces us to put unnecessary "core limits" on > jobs. > > I think this type of scheduling optimization is very specific to Twitter. > As is the scale we're working at, which meant that score-based scheduling > using just OfferSelector didn't work for us. But there are some neat > optimizations we can make for just the condition we want, but it requires > access to the underlying data structures (basically to support reverse > iteration in a performant way). So all of this motivates this patch. > > Jordan Ly wrote: > @StephanErb I now realize that you meant you were still utilizing the > `offer_order` option and not the custom `offer_order_modules` one. > > I will re-add this in a follow-up patch since it can still co-exist with > the default `OfferSetImpl`.
@David Thanks a lot for sharing! That is indeed a problem I have not encountered or considered before. @Jordan Thanks! - Stephan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65233/#review195844 ----------------------------------------------------------- On Jan. 20, 2018, 2:31 a.m., Jordan Ly wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/65233/ > ----------------------------------------------------------- > > (Updated Jan. 20, 2018, 2:31 a.m.) > > > Review request for Aurora, David McLaughlin, Stephan Erb, and Bill Farner. > > > Repository: aurora > > > Description > ------- > > The goal of this patch is to provide a more reasonable abstraction for custom > scheduling. > > Currently, we have `OfferSelector`, `OfferOrder`, and recently I proposed > `FilterableOfferCollection` (https://reviews.apache.org/r/65225/). These were > all created in order to provide more customization within the scheduling > loop. However, they suffer from being leaky and too disparate. This patch > hopes to combine all of those components into a single `OfferSet` which can > be injected and used by HostOffers. This interface allows for custom > scheduling logic to be co-located with custom data structures for `offers` as > opposed to being passed around as different components. > > The following options will be removed from the last 0.19 to now: > ``` > -offer_order_modules > ``` > > > Diffs > ----- > > RELEASE-NOTES.md 3053e543793669601ec9132adae376bcdce64c32 > docs/reference/scheduler-configuration.md > f697b6f1f5332ac9a9feade48317ecf5818e0c92 > src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java > f44cb61f5d1805aeb0f98135453dbc82f3bc9ad4 > src/main/java/org/apache/aurora/scheduler/config/CliOptions.java > e4e535824adb86c98c6173a20f97a9a90b08483a > src/main/java/org/apache/aurora/scheduler/offers/HostOffers.java > 2ea7a01085b87c8ed6765537a8005e2349784ab0 > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java > 8f9e33d81be9087821784a8a08079a1736d9cb63 > src/main/java/org/apache/aurora/scheduler/offers/OfferManagerModule.java > de16c14c887d4225e0629b1580eb5e740798f1f7 > src/main/java/org/apache/aurora/scheduler/offers/OfferOrderBuilder.java > 1260ef19506acb8e8a937d4fd7b7152361bd3c40 > src/main/java/org/apache/aurora/scheduler/offers/OfferSet.java PRE-CREATION > src/main/java/org/apache/aurora/scheduler/offers/OfferSetImpl.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java > 1e36b2c3094e99f05aa4a4f098a48df2293b4320 > > src/main/java/org/apache/aurora/scheduler/scheduling/FirstFitOfferSelector.java > 5f4f452cb133e2be73f881700a54ba4c8e60a35a > > src/main/java/org/apache/aurora/scheduler/scheduling/FirstFitOfferSelectorModule.java > 4d36487f430483444314a8cefd133e7c766075de > src/main/java/org/apache/aurora/scheduler/scheduling/OfferSelector.java > 6f5d55c5a586ac923b5f375d1935ec2049673ec0 > src/main/java/org/apache/aurora/scheduler/scheduling/TaskAssignerImpl.java > ec416cc12b2fb4a11a273c77dbe72b79f96fa7e1 > > src/main/java/org/apache/aurora/scheduler/scheduling/TaskAssignerImplModule.java > d101a497896ba3b8ca7984a63f861494cbddbb35 > src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java > c84fad3b615c49ca7527ae3f7fa116aa8f90ff0c > src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java > 28224a5c587b235ccc86a286e796eeca66929232 > > src/test/java/org/apache/aurora/scheduler/scheduling/FirstFitOfferSelectorTest.java > e6b2b7465fb4fd432cda50d1e714dfc587764689 > > src/test/java/org/apache/aurora/scheduler/scheduling/TaskAssignerImplTest.java > f84941edd96d974a6c2d7e087324bc8fbbac3b0a > > > Diff: https://reviews.apache.org/r/65233/diff/6/ > > > Testing > ------- > > `./gradlew test` > `./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh` > > Testing injection of a custom `OfferSet` onto a test cluster. > > > Thanks, > > Jordan Ly > >