Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Adam B Tue, 18 Feb 2014 17:30:06 -0800


> On Feb. 18, 2014, 2:07 p.m., Vinod Kone wrote:
> > src/tests/fault_tolerance_tests.cpp, line 712
> > <https://reviews.apache.org/r/16724/diff/6/?file=492072#file492072line712>
> >
> >     Why do you need to AWAIT on this?


Copy/paste coding, sanity-checking overkill as I was trying to get this stuff 
figured out. Removed.


> On Feb. 18, 2014, 2:07 p.m., Vinod Kone wrote:
> > src/tests/mesos.hpp, lines 239-252
> > <https://reviews.apache.org/r/16724/diff/6/?file=492073#file492073line239>
> >
> >     Why not augment createTask() above?
> >     
> >     Also, it seems a bit weird to use DEFAULT_EXECUTOR_INFO as 
> > Task.ExecutorInfo.

If I use createTask() as is, I get "Failed to launch executor <foo> of 
framework <bar> because it is unknown to the isolator".
If I add "task.mutable_executor()->CopyFrom(DEFAULT_EXECUTOR_INFO);" after 
calling createTask, I get TASK_LOST ("TaskInfo must have either an 'executor' 
or a 'command'").
>From the declaration of TaskInfo in mesos.proto, "Either ExecutorInfo or 
>CommandInfo should be set" (not both), so I think it makes sense to have 
>createTaskWithCommand() and createTaskWithExecutor(). Since I'm just using the 
>DEFAULT_EXECUTOR_INFO, I have renamed my function 
>createTaskWithDefaultExecutor(), but we could generalize this to take in an 
>(optional) ExecutorInfo.


> On Feb. 18, 2014, 2:07 p.m., Vinod Kone wrote:
> > src/tests/fault_tolerance_tests.cpp, lines 788-789
> > <https://reviews.apache.org/r/16724/diff/6/?file=492072#file492072line788>
> >
> >     ditto.

Turns out I don't need either of these. Removed.


> On Feb. 18, 2014, 2:07 p.m., Vinod Kone wrote:
> > src/tests/fault_tolerance_tests.cpp, lines 786-787
> > <https://reviews.apache.org/r/16724/diff/6/?file=492072#file492072line786>
> >
> >     Do you need to wait on both?
> >     
> >     If no, kill one of them.

Turns out I need both of these.
Executor::shutdown() is called, and ought to be expected, else we get 
"Uninteresting mock function call".
Slave::executorTerminated is the call I need to wait/settle on, since it is 
what actually calls removeFramework to put the framework in completedFrameworks.


> On Feb. 18, 2014, 2:07 p.m., Vinod Kone wrote:
> > src/tests/fault_tolerance_tests.cpp, line 666
> > <https://reviews.apache.org/r/16724/diff/6/?file=492072#file492072line666>
> >
> >     const string&

Changed to const string (no ref), since std::string.substr() returns a new 
string, and we shouldn't return a reference to a local variable.


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/#review34767
-----------------------------------------------------------


On Feb. 17, 2014, 4:23 p.m., Adam B wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16724/
> -----------------------------------------------------------
> 
> (Updated Feb. 17, 2014, 4:23 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-767
>     https://issues.apache.org/jira/browse/MESOS-767
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added completed frameworks/tasks to slave re-registration.
> Fixes MESOS-767.
> 
> Additional issues discovered during investigation:
> - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
> - MESOS-906: Last task in Completed Framework never graduates from
> terminatedTasks to completedTasks.
> - Completed frameworks/executors/tasks are stored in circular buffers,
> and these may overflow in different orders on different slaves. 
> BenH proposes an archive to replace these circular buffers.
> 
> 
> Diffs
> -----
> 
>   include/mesos/scheduler.hpp 2e4707e 
>   src/master/master.hpp 7649737 
>   src/master/master.cpp 77872ec 
>   src/messages/messages.proto 922a8c4 
>   src/slave/slave.cpp 2d21e16 
>   src/tests/fault_tolerance_tests.cpp 60e06cc 
>   src/tests/mesos.hpp d7bdaee 
> 
> Diff: https://reviews.apache.org/r/16724/diff/
> 
> 
> Testing
> -------
> 
> make check; manually failed-over a master, watched the slave reregister its 
> completed frameworks, web UI shows completed tasks and stdout/stderr.
> Added a new unit/integration test to verify the expected behavior.
> 
> 
> Thanks,
> 
> Adam B
> 
>

Re: Review Request 16724: Added completed frameworks/tasks to slave re-registration.

Reply via email to