-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16724/
-----------------------------------------------------------
(Updated Feb. 17, 2014, 4:23 p.m.)
Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and
Vinod Kone.
Changes
-------
Added a unit test that starts a task/framework, then kills the task and shuts
down the framework, leaving a completedFramework on the slave. After restarting
the master, the slave reregisters with the new master and the completed
framework is added to the new master's state.
Master/slave state is read using find/substr on the state.json endpoint of
each. A better approach would use a real json parser to get at the nested
elements (to verify the executor/task status), and an even better approach
would make the test a friend of Master/Slave so it can read the
frameworks/completedFrameworks collections directly.
Bugs: MESOS-767
https://issues.apache.org/jira/browse/MESOS-767
Repository: mesos-git
Description
-------
Added completed frameworks/tasks to slave re-registration.
Fixes MESOS-767.
Additional issues discovered during investigation:
- MESOS-905: Remove Framework.id in favor of FrameworkInfo.id
- MESOS-906: Last task in Completed Framework never graduates from
terminatedTasks to completedTasks.
- Completed frameworks/executors/tasks are stored in circular buffers,
and these may overflow in different orders on different slaves.
BenH proposes an archive to replace these circular buffers.
Diffs (updated)
-----
include/mesos/scheduler.hpp 2e4707e
src/master/master.hpp 7649737
src/master/master.cpp 77872ec
src/messages/messages.proto 922a8c4
src/slave/slave.cpp 2d21e16
src/tests/fault_tolerance_tests.cpp 60e06cc
src/tests/mesos.hpp d7bdaee
Diff: https://reviews.apache.org/r/16724/diff/
Testing (updated)
-------
make check; manually failed-over a master, watched the slave reregister its
completed frameworks, web UI shows completed tasks and stdout/stderr.
Added a new unit/integration test to verify the expected behavior.
Thanks,
Adam B