> On Oct. 8, 2014, 10:42 a.m., Vinod Kone wrote:
> > src/tests/slave_tests.cpp, lines 1071-1075
> > <https://reviews.apache.org/r/23912/diff/6-7/?file=714593#file714593line1071>
> >
> >     This has not been moved !?

I have been searching through the previous review comments, but have not been 
able to determine which one you are referring to. However, I strongly suspect 
that you want this code to be moved downwards, after driver.launchTasks(), just 
before driver.killTask. Correct?


> On Oct. 8, 2014, 10:42 a.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 1355
> > <https://reviews.apache.org/r/23912/diff/6-7/?file=714590#file714590line1355>
> >
> >     I missed this before, but you also should erase the executorid if this 
> > is the last task on this executor. additionally, you want to remove the 
> > framework is this is the last executor. see _runTask() for details.

Yes, thanks! I was wondering about this, but not enough, not aware how eagerly 
the slave "wants" to remove its framework info :-) The next patch will support 
that. The fix will actually be located in _runTask().


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23912/#review55834
-----------------------------------------------------------


On Oct. 8, 2014, 3:57 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23912/
> -----------------------------------------------------------
> 
> (Updated Oct. 8, 2014, 3:57 a.m.)
> 
> 
> Review request for mesos.
> 
> 
> Bugs: MESOS-947
>     https://issues.apache.org/jira/browse/MESOS-947
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Fixes MESOS-947 "Slave should properly handle a killTask() that arrives 
> between runTask() and _runTask()".
> 
> Slave::killTask() did not check for task in question combination to be 
> "pending" (i.e. Slave::runTask had happened, but Slave::_runTask had not yet) 
> and then erroneously assumed that Slave::runTask() had not been executed. The 
> task was then marked "LOST" instead of "KILLED". But Slave::runTask had 
> already scheduled Slave::_runTask to follow. Now the entry for being 
> "pending" is removed, and the task is marked "KILLED", and _runTask gets 
> informed about this. It checks whether the task in question is currently 
> "pending" and if it is not, then it infers that the task has been killed and 
> does not erroneously try to complete launching it.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 28697102047b972ecb3b6b627ee089b430549fc0 
>   src/slave/slave.cpp e56dcbd80114730949a0d4b553470802a4d38281 
>   src/tests/mesos.hpp 957e2233cc11c438fd80d3b6d1907a1223093104 
>   src/tests/mesos.cpp 3dcb2acd5ad4ab5e3a7b4fe524ee077558112773 
>   src/tests/slave_tests.cpp 69be28f6e82b99e23424bd2be8294f715d8040d4 
> 
> Diff: https://reviews.apache.org/r/23912/diff/
> 
> 
> Testing
> -------
> 
> Wrote a unit test that reliably created the situation described in the 
> ticket. Observed that TASK_LOST and the listed log output occurred. This 
> pointed directly to the lines in killTask() where the problem is rooted. Ran 
> the test after fixing, it succeeded. Checked the log. It looks like a "clean 
> kill" now :-)
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>

Reply via email to