----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/59480/#review176540 -----------------------------------------------------------
Ship it! Master (d7425aa) is green with this patch. ./build-support/jenkins/build.sh I will refresh this build result if you post a review containing "@ReviewBot retry" - Aurora ReviewBot On May 31, 2017, 9:41 p.m., David McLaughlin wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/59480/ > ----------------------------------------------------------- > > (Updated May 31, 2017, 9:41 p.m.) > > > Review request for Aurora, Santhosh Kumar Shanmugham and Stephan Erb. > > > Repository: aurora > > > Description > ------- > > This patch enables scalable, high-performance Scheduler bin-packing using the > existing first-fit task assigner, and it can be controlled with a simple > command line argument. > > The bin-packing is only an approximation, but can lead to pretty significant > improvements in resource utilization per agent. For example, on a CPU-bound > cluster with 30k+ hosts and 135k tasks (across 1k+ jobs) - we were able to > reduce the number of hosts with tasks scheduled on them to just 90%, down > from 99.7% (as one would expect from randomization). So if you are running > Aurora on elastic computing and paying for machines by the minute/hour, then > utilizing this patch _could_ allow you to reduce your server footprint by as > much as 10%. > > The approximation is based on the simple idea that you have the best chance > of having perfect bin-packing if you put tasks in the smallest slot > available. So if you have a task needing 8 cores and you have an 8 core and > 12 core offer available - you'd always want to put the task in the 8 core > offer*. By sorting offers in OfferManager during iteration, then a first-fit > algorithm is guaranteed to match the smallest possible offer for your task > and achieves this. > > * - The correct decision of course depends on the other pending tasks and the > other resources available, and more satisfactory results may also need > preemption, etc. > > > Diffs > ----- > > RELEASE-NOTES.md 75b3ddb856dc5d889a9006490f57cc58ee7d82fc > src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java > b933a5bbc2d1d5edb5e473135fb523a1fe02db35 > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java > 78255e6dfa31c4920afc0221ee60ec4f8c2a12c4 > src/main/java/org/apache/aurora/scheduler/offers/OfferOrder.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/offers/OfferOrderBuilder.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java > adf7f33e4a72d87c3624f84dfe4998e20dc75fdc > src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java > 317a2d26d8bfa27988c60a7706b9fb3aa9b4e2a2 > src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java > d7addc0effb60c196cf339081ad81de541d05385 > src/test/java/org/apache/aurora/scheduler/resources/ResourceTestUtil.java > 676d305d257585e53f0a05b359ba7eb11f1b23be > > > Diff: https://reviews.apache.org/r/59480/diff/2/ > > > Testing > ------- > > This has been scale-tested with production-like workloads and performs well, > adding only a few extra seconds total in TaskAssigner when applied to > thousands of tasks per minute. > > There is an overhead when scheduling tasks that have large resource > requirements - as the task assigner will first need to skip offer all the > offers with low resources. In a packed cluster, this is where the extra > seconds are spent. This could be reduced by just jumping over all the offers > we know to be too small, but that decision has to map to the OfferOrder > (which adds complexity). That can be addressed in a follow-up review if > needed. > > > Thanks, > > David McLaughlin > >