----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65338/#review196248 -----------------------------------------------------------
src/main/java/org/apache/aurora/scheduler/scheduling/TaskAssignerImpl.java Line 135 (original), 135 (patched) <https://reviews.apache.org/r/65338/#comment275796> To prevent future mistakes of the same kind, should we remove the `casState` state here and just transition to `LOST` unconditionally? I cannot think of a scenario right now where launching would fail, but we would still like the task to live on. - Stephan Erb On Jan. 25, 2018, 9:43 a.m., David McLaughlin wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/65338/ > ----------------------------------------------------------- > > (Updated Jan. 25, 2018, 9:43 a.m.) > > > Review request for Aurora, Jordan Ly and Santhosh Kumar Shanmugham. > > > Repository: aurora > > > Description > ------- > > Discovered while debugging https://issues.apache.org/jira/browse/AURORA-1966. > Before we attempt to launch a task, we move the task to ASSIGNED state. > However, the code to deal with launch failures expects the task to be in > PENDING state. So the ASSIGNED -> LOST state change fails, and instead we > rely on the transient task timeout for correctness. This means errors that > can be recovered from in seconds instead take at least five minutes (by > default). > > > Diffs > ----- > > src/main/java/org/apache/aurora/scheduler/scheduling/TaskAssignerImpl.java > 916908bbf635a261c01777cd3a357ca457dd9726 > > src/test/java/org/apache/aurora/scheduler/scheduling/TaskAssignerImplTest.java > 533bb44953163e2148fa18c394a4338938dae205 > > > Diff: https://reviews.apache.org/r/65338/diff/1/ > > > Testing > ------- > > ./gradlew test > > Also tested in Vagrant. > > > Thanks, > > David McLaughlin > >