> On Feb. 18, 2014, 2:07 p.m., Vinod Kone wrote: > > src/tests/fault_tolerance_tests.cpp, line 712 > > <https://reviews.apache.org/r/16724/diff/6/?file=492072#file492072line712> > > > > Why do you need to AWAIT on this?
Copy/paste coding, sanity-checking overkill as I was trying to get this stuff figured out. Removed. > On Feb. 18, 2014, 2:07 p.m., Vinod Kone wrote: > > src/tests/mesos.hpp, lines 239-252 > > <https://reviews.apache.org/r/16724/diff/6/?file=492073#file492073line239> > > > > Why not augment createTask() above? > > > > Also, it seems a bit weird to use DEFAULT_EXECUTOR_INFO as > > Task.ExecutorInfo. If I use createTask() as is, I get "Failed to launch executor <foo> of framework <bar> because it is unknown to the isolator". If I add "task.mutable_executor()->CopyFrom(DEFAULT_EXECUTOR_INFO);" after calling createTask, I get TASK_LOST ("TaskInfo must have either an 'executor' or a 'command'"). >From the declaration of TaskInfo in mesos.proto, "Either ExecutorInfo or >CommandInfo should be set" (not both), so I think it makes sense to have >createTaskWithCommand() and createTaskWithExecutor(). Since I'm just using the >DEFAULT_EXECUTOR_INFO, I have renamed my function >createTaskWithDefaultExecutor(), but we could generalize this to take in an >(optional) ExecutorInfo. > On Feb. 18, 2014, 2:07 p.m., Vinod Kone wrote: > > src/tests/fault_tolerance_tests.cpp, lines 788-789 > > <https://reviews.apache.org/r/16724/diff/6/?file=492072#file492072line788> > > > > ditto. Turns out I don't need either of these. Removed. > On Feb. 18, 2014, 2:07 p.m., Vinod Kone wrote: > > src/tests/fault_tolerance_tests.cpp, lines 786-787 > > <https://reviews.apache.org/r/16724/diff/6/?file=492072#file492072line786> > > > > Do you need to wait on both? > > > > If no, kill one of them. Turns out I need both of these. Executor::shutdown() is called, and ought to be expected, else we get "Uninteresting mock function call". Slave::executorTerminated is the call I need to wait/settle on, since it is what actually calls removeFramework to put the framework in completedFrameworks. > On Feb. 18, 2014, 2:07 p.m., Vinod Kone wrote: > > src/tests/fault_tolerance_tests.cpp, line 666 > > <https://reviews.apache.org/r/16724/diff/6/?file=492072#file492072line666> > > > > const string& Changed to const string (no ref), since std::string.substr() returns a new string, and we shouldn't return a reference to a local variable. - Adam ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16724/#review34767 ----------------------------------------------------------- On Feb. 17, 2014, 4:23 p.m., Adam B wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/16724/ > ----------------------------------------------------------- > > (Updated Feb. 17, 2014, 4:23 p.m.) > > > Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and > Vinod Kone. > > > Bugs: MESOS-767 > https://issues.apache.org/jira/browse/MESOS-767 > > > Repository: mesos-git > > > Description > ------- > > Added completed frameworks/tasks to slave re-registration. > Fixes MESOS-767. > > Additional issues discovered during investigation: > - MESOS-905: Remove Framework.id in favor of FrameworkInfo.id > - MESOS-906: Last task in Completed Framework never graduates from > terminatedTasks to completedTasks. > - Completed frameworks/executors/tasks are stored in circular buffers, > and these may overflow in different orders on different slaves. > BenH proposes an archive to replace these circular buffers. > > > Diffs > ----- > > include/mesos/scheduler.hpp 2e4707e > src/master/master.hpp 7649737 > src/master/master.cpp 77872ec > src/messages/messages.proto 922a8c4 > src/slave/slave.cpp 2d21e16 > src/tests/fault_tolerance_tests.cpp 60e06cc > src/tests/mesos.hpp d7bdaee > > Diff: https://reviews.apache.org/r/16724/diff/ > > > Testing > ------- > > make check; manually failed-over a master, watched the slave reregister its > completed frameworks, web UI shows completed tasks and stdout/stderr. > Added a new unit/integration test to verify the expected behavior. > > > Thanks, > > Adam B > >