> On March 7, 2018, 10:48 a.m., David McLaughlin wrote: > > So what happens if there are two bad hosts? :) > > Jordan Ly wrote: > This does not scale past n=1 > > We can make this more generic by getting the list of hosts the task has > previously failed on and looking through offers for a host the task did not > fail on for some operator defined value (something like > `-failure_avoidance_factor`)
Note making this more generic is still incumbent on the amount of task history we have on the scheduler. - Santhosh Kumar ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65941/#review198803 ----------------------------------------------------------- On March 6, 2018, 9:50 p.m., Jordan Ly wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/65941/ > ----------------------------------------------------------- > > (Updated March 6, 2018, 9:50 p.m.) > > > Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, and > Stephan Erb. > > > Repository: aurora > > > Description > ------- > > If a task fails on a host, we should try to avoid rescheduling the task on > the same host if possible. This is done in order to avoid a potentially bad > host. This issue generally comes up when you are bin-packing hosts (i.e. > using the `-offer_order` option). > > If there are no other offers to schedule the task on, we will still use the > offer. > > > Diffs > ----- > > src/main/java/org/apache/aurora/scheduler/scheduling/TaskAssignerImpl.java > fcafecf63040f9c410458dedfd3d87b0d669d205 > > src/test/java/org/apache/aurora/scheduler/scheduling/TaskAssignerImplTest.java > 864538b6730d7318385494818276ba370124b8e9 > > > Diff: https://reviews.apache.org/r/65941/diff/1/ > > > Testing > ------- > > `./gradlew test` > > Benchmarks and live-cluster testing coming soon. > > > Thanks, > > Jordan Ly > >