> On Jan. 27, 2014, 10:44 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 725-728
> > <https://reviews.apache.org/r/16724/diff/3/?file=425219#file425219line725>
> >
> >     I think you've brought this up before but did you figure out why a 
> > completed executor has terminated tasks?
> 
> Adam B wrote:
>     Not exactly, not yet. I'll look into this as I'm writing and running 
> tests.

Reproduced the flakiness in a unit test; smells like a race condition. I'll dig 
into it further as a part of MESOS-906.


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review32961
-----------------------------------------------------------


On Feb. 18, 2014, 6:07 p.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2014, 6:07 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   include/mesos/scheduler.hpp 2e4707e 
>   src/master/master.hpp 7649737 
>   src/master/master.cpp 77872ec 
>   src/messages/messages.proto 922a8c4 
>   src/slave/slave.cpp 2d21e16 
>   src/tests/fault_tolerance_tests.cpp 60e06cc 
>   src/tests/mesos.hpp d7bdaee 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its 
> completed frameworks, web UI shows completed tasks and stdout/stderr.
> Added a new unit/integration test to verify the expected behavior.
> 
> 
> Thanks,
> 
> Adam B
> 
>

Reply via email to