Re: Review Request 33249: Send statusUpdate to scheduler on containerizer launch failure

Timothy Chen Thu, 16 Apr 2015 21:14:57 -0700


> On April 16, 2015, 5:32 p.m., Timothy Chen wrote:
> > src/slave/slave.cpp, line 3085
> > <https://reviews.apache.org/r/33249/diff/1/?file=931231#file931231line3085>
> >
> >     We're already sending back a status update when the registration 
> > timeout, and if we send another one here the scheduler will actually get 
> > two TASK_FAILED statuses instead.
> >     
> >     I think either we populate the reason when we send back the final 
> > status update that it's the containerizer launched failed, or we make sure 
> > we just send one here.
> >     
> >     The nice thing about having it be handled in the timeout is that it's 
> > less places in the slave that we do status updates, but with the cavaet you 
> > wait until the timeout to occur which is something I never really liked 
> > about.
> >     
> >     I think if we can make the code clean and make sure there is just one 
> > status update propagated back I rather see it happen here.
> 
> Jay Buffington wrote:
>     Sending a terminal update (TASK_FAILED) removes the task from 
> 'executor->queuedTasks', so the scheduler won't get two status updates.  See 
> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2112
>     
>     I admit this is super confusing, in fact, when I ran the code the first 
> time I was expecting to see two status updates. I pinged Vinod about it and 
> he was confused and it took us a while to work through what was going on.
>     
>     I am concerned that we are changing state for the callbacks that clean 
> things up, so I'm open to moving it.  When you say "timeout" are you 
> referring to the Slave::sendExecutorTerminatedStatusUpdate method?


Ah, I forgot about it too. I think comments will be great so we avoid confusion!


- Timothy


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33249/#review80347
-----------------------------------------------------------


On April 16, 2015, 3:16 p.m., Jay Buffington wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33249/
> -----------------------------------------------------------
> 
> (Updated April 16, 2015, 3:16 p.m.)
> 
> 
> Review request for mesos, Ben Mahler, Timothy Chen, and Vinod Kone.
> 
> 
> Bugs: MESOS-2020
>     https://issues.apache.org/jira/browse/MESOS-2020
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When mesos is unable to launch the containerizer the scheduler should
> get a TASK_FAILED with a status message that includes the error the
> containerizer encounted when trying to launch.
> 
> Introduces a new TaskStatus: REASON_CONTAINERIZER_LAUNCH_FAILED
> 
> Fixes MESOS-2020
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 3a8e8bf303e0576c212951f6028af77e54d93537 
>   src/slave/slave.cpp a0595f93ce4720f5b9926326d01210460ccb0667 
>   src/tests/containerizer.cpp 26b87ac6b16dfeaf84888e80296ef540697bd775 
>   src/tests/slave_tests.cpp b826000e0a4221690f956ea51f49ad4c99d5e188 
> 
> Diff: https://reviews.apache.org/r/33249/diff/
> 
> 
> Testing
> -------
> 
> I added test case to slave_test.cpp.  I also tried this with Aurora, supplied 
> a bogus docker image url and saw the "docker pull" failure stderr message in 
> Aurora's web UI.
> 
> 
> Thanks,
> 
> Jay Buffington
> 
>

Re: Review Request 33249: Send statusUpdate to scheduler on containerizer launch failure

Reply via email to