> On April 16, 2015, 5:32 p.m., Timothy Chen wrote: > > src/slave/slave.cpp, line 3085 > > <https://reviews.apache.org/r/33249/diff/1/?file=931231#file931231line3085> > > > > We're already sending back a status update when the registration > > timeout, and if we send another one here the scheduler will actually get > > two TASK_FAILED statuses instead. > > > > I think either we populate the reason when we send back the final > > status update that it's the containerizer launched failed, or we make sure > > we just send one here. > > > > The nice thing about having it be handled in the timeout is that it's > > less places in the slave that we do status updates, but with the cavaet you > > wait until the timeout to occur which is something I never really liked > > about. > > > > I think if we can make the code clean and make sure there is just one > > status update propagated back I rather see it happen here.
Sending a terminal update (TASK_FAILED) removes the task from 'executor->queuedTasks', so the scheduler won't get two status updates. See https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2112 I admit this is super confusing, in fact, when I ran the code the first time I was expecting to see two status updates. I pinged Vinod about it and he was confused and it took us a while to work through what was going on. I am concerned that we are changing state for the callbacks that clean things up, so I'm open to moving it. When you say "timeout" are you referring to the Slave::sendExecutorTerminatedStatusUpdate method? - Jay ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33249/#review80347 ----------------------------------------------------------- On April 16, 2015, 3:16 p.m., Jay Buffington wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/33249/ > ----------------------------------------------------------- > > (Updated April 16, 2015, 3:16 p.m.) > > > Review request for mesos, Ben Mahler, Timothy Chen, and Vinod Kone. > > > Bugs: MESOS-2020 > https://issues.apache.org/jira/browse/MESOS-2020 > > > Repository: mesos > > > Description > ------- > > When mesos is unable to launch the containerizer the scheduler should > get a TASK_FAILED with a status message that includes the error the > containerizer encounted when trying to launch. > > Introduces a new TaskStatus: REASON_CONTAINERIZER_LAUNCH_FAILED > > Fixes MESOS-2020 > > > Diffs > ----- > > include/mesos/mesos.proto 3a8e8bf303e0576c212951f6028af77e54d93537 > src/slave/slave.cpp a0595f93ce4720f5b9926326d01210460ccb0667 > src/tests/containerizer.cpp 26b87ac6b16dfeaf84888e80296ef540697bd775 > src/tests/slave_tests.cpp b826000e0a4221690f956ea51f49ad4c99d5e188 > > Diff: https://reviews.apache.org/r/33249/diff/ > > > Testing > ------- > > I added test case to slave_test.cpp. I also tried this with Aurora, supplied > a bogus docker image url and saw the "docker pull" failure stderr message in > Aurora's web UI. > > > Thanks, > > Jay Buffington > >