> On April 21, 2015, 11:25 p.m., Jie Yu wrote: > > src/slave/slave.cpp, lines 3065-3078 > > <https://reviews.apache.org/r/33249/diff/3/?file=938221#file938221line3065> > > > > Instead of doing that in your way, can we just try to make sure > > `containerizer->wait` here will return a failure (or a Termination with > > some reason) when `containerizer->launch` fails. In that way, the > > `executorTerminated` will properly send status updates to the slave > > (TASK_LOST/TASK_FAILED). > > > > Or am I missing something?
OK, I think I got confused by the ticket. There are actually two problems here. The problem I am refering to is the fact that we don't send status update to the scheduler if containerizer launch fails until executor reregistration timeout happens. Since for docker containerizer, someone might use a very large timeout value, ideally, the slave should send a status update to the scheduler right after containerizer launch fails. After chat with Jay, the problem you guys are refering to is the fact that the scheduler cannot disinguish between the case where the task has failed vs. the case where the configuration of a task is not correct, because in both cases, the scheduler will receive a TASK_FAILED/TASK_LOST. - Jie ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33249/#review81090 ----------------------------------------------------------- On April 21, 2015, 5:14 p.m., Jay Buffington wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/33249/ > ----------------------------------------------------------- > > (Updated April 21, 2015, 5:14 p.m.) > > > Review request for mesos, Ben Mahler, Timothy Chen, and Vinod Kone. > > > Bugs: MESOS-2020 > https://issues.apache.org/jira/browse/MESOS-2020 > > > Repository: mesos > > > Description > ------- > > When mesos is unable to launch the containerizer the scheduler should > get a TASK_FAILED with a status message that includes the error the > containerizer encounted when trying to launch. > > Introduces a new TaskStatus: REASON_CONTAINERIZER_LAUNCH_FAILED > > Fixes MESOS-2020 > > > Diffs > ----- > > include/mesos/mesos.proto 3a8e8bf303e0576c212951f6028af77e54d93537 > src/slave/slave.cpp 8ec80ed26f338690e0a1e712065750ab77a724cd > src/tests/slave_tests.cpp b826000e0a4221690f956ea51f49ad4c99d5e188 > > Diff: https://reviews.apache.org/r/33249/diff/ > > > Testing > ------- > > I added test case to slave_test.cpp. I also tried this with Aurora, supplied > a bogus docker image url and saw the "docker pull" failure stderr message in > Aurora's web UI. > > > Thanks, > > Jay Buffington > >