Re: Review Request 36617: Improved task reconciliation documentation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/#review93147 --- Ship it! Thanks Jan! I've made the updates from the feedback and will get this committed shortly. docs/reconciliation.md (lines 43 - 47) https://reviews.apache.org/r/36617/#comment147407 Let's clarify what we mean by current here. That is, non-terminal. docs/reconciliation.md (line 83) https://reviews.apache.org/r/36617/#comment147408 Hm.. what kind of failure? Let's clarify that this is relevant to master failover. docs/reconciliation.md (lines 83 - 91) https://reviews.apache.org/r/36617/#comment147413 Thanks Jan! It might be a bit more clear if we move this below where we say that the algorithm uses retries, given this is why the retries are needed. Also, might be helpful to point out that this time is bounded by the --slave_reregister_timeout flag. docs/reconciliation.md (lines 93 - 95) https://reviews.apache.org/r/36617/#comment147417 It seems a bit odd to prescribe that frameworks have to persist task information here, they are free not to as well. Perhaps we need a document which describes some recommendations on framework implementation? That document could point to reconciliation as one aspect of implementation, and could also talk about persistence as its own topic (e.g. write ahead storage, how to achieve high throughput, what are the implications of no persistence in the scheduler? what are the impliciations of non-replicated storage? etc).. Write-ahead storage means that with a strict registry the scheduler only needs to perform explicit reconciliation (although implicit would be prudent as a defense). I'm inclined to not mention storage here though because the registry is non-strict by default (so everyone should be doing implicit reconciliation). docs/reconciliation.md (line 97) https://reviews.apache.org/r/36617/#comment147415 Why move this above? docs/reconciliation.md (lines 98 - 99) https://reviews.apache.org/r/36617/#comment147414 This should be already captured by non-terminal, right? docs/reconciliation.md (line 100) https://reviews.apache.org/r/36617/#comment147416 Well, that's certainly one way to handle it, but it seems a bit odd to prescribe that here. For example, a scheduler could recover the task as well. docs/reconciliation.md (lines 119 - 123) https://reviews.apache.org/r/36617/#comment147410 Hm.. I'm a bit confused by this addition, what time period are you referring to? One of the critical reasons for periodic reconcilition is that by default we don't use a strict registry. With a non-strict registry, the master does not enforce slave removal across master failovers. I'll add a note about this; as a result tasks may resurrect from a lost state (hence the need to discover them). We should probably also move this up out of the notes since its required by default. - Ben Mahler On July 23, 2015, 12:01 p.m., Jan Schlicht wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/ --- (Updated July 23, 2015, 12:01 p.m.) Review request for mesos and Joerg Schad. Bugs: MESOS-3127 https://issues.apache.org/jira/browse/MESOS-3127 Repository: mesos Description --- Improved task reconciliation documentation. Diffs - docs/reconciliation.md 17537ba8420c95d833e64ccf82ff9bb4530497f0 Diff: https://reviews.apache.org/r/36617/diff/ Testing --- https://gist.github.com/nfnt/73532d62fe39d27ff33d Thanks, Jan Schlicht
Re: Review Request 36617: Improved task reconciliation documentation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/#review92742 --- Patch looks great! Reviews applied: [36617] All tests passed. - Mesos ReviewBot On July 23, 2015, 12:01 p.m., Jan Schlicht wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/ --- (Updated July 23, 2015, 12:01 p.m.) Review request for mesos and Joerg Schad. Bugs: MESOS-3127 https://issues.apache.org/jira/browse/MESOS-3127 Repository: mesos Description --- Improved task reconciliation documentation. Diffs - docs/reconciliation.md 17537ba8420c95d833e64ccf82ff9bb4530497f0 Diff: https://reviews.apache.org/r/36617/diff/ Testing --- https://gist.github.com/nfnt/73532d62fe39d27ff33d Thanks, Jan Schlicht
Re: Review Request 36617: Improved task reconciliation documentation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/#review92741 --- Ship it! docs/reconciliation.md (line 94) https://reviews.apache.org/r/36617/#comment146989 When the framework receives task status updates from the master, the framework should: - Joerg Schad On July 23, 2015, 11:48 a.m., Jan Schlicht wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/ --- (Updated July 23, 2015, 11:48 a.m.) Review request for mesos and Joerg Schad. Bugs: MESOS-3127 https://issues.apache.org/jira/browse/MESOS-3127 Repository: mesos Description --- Improved task reconciliation documentation. Diffs - docs/reconciliation.md 17537ba8420c95d833e64ccf82ff9bb4530497f0 Diff: https://reviews.apache.org/r/36617/diff/ Testing --- https://gist.github.com/nfnt/73532d62fe39d27ff33d Thanks, Jan Schlicht
Re: Review Request 36617: Improved task reconciliation documentation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/ --- (Updated July 23, 2015, 2:01 p.m.) Review request for mesos and Joerg Schad. Bugs: MESOS-3127 https://issues.apache.org/jira/browse/MESOS-3127 Repository: mesos Description --- Improved task reconciliation documentation. Diffs (updated) - docs/reconciliation.md 17537ba8420c95d833e64ccf82ff9bb4530497f0 Diff: https://reviews.apache.org/r/36617/diff/ Testing --- https://gist.github.com/nfnt/73532d62fe39d27ff33d Thanks, Jan Schlicht
Re: Review Request 36617: Improved task reconciliation documentation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/#review92739 --- docs/reconciliation.md (line 43) https://reviews.apache.org/r/36617/#comment146987 Could you add one sentence before what the goal of Task reconciliation is? i.e. which problem are we solving... (you mention that partially in the algorithm ... docs/reconciliation.md (line 83) https://reviews.apache.org/r/36617/#comment146982 after a failure, there a reconcilation period in which the slaves are reregistering with the master. Because the master does only know about the state of the already reregistered slaves --and the other slaves could have potentially failed--, it can only reconcile . docs/reconciliation.md (line 88) https://reviews.apache.org/r/36617/#comment146983 s/in transitioning state/not yet reregistered docs/reconciliation.md (line 89) https://reviews.apache.org/r/36617/#comment146984 Hence during reconciliation we need several iterations of task status updates until the master and slave status are in sync again. docs/reconciliation.md (line 92) https://reviews.apache.org/r/36617/#comment146985 /s/task update/task status update also below docs/reconciliation.md (line 97) https://reviews.apache.org/r/36617/#comment146986 (e.g. TASK_FINISHED, ... docs/reconciliation.md (line 118) https://reviews.apache.org/r/36617/#comment146988 /s/As/as - Joerg Schad On July 22, 2015, 9:41 a.m., Jan Schlicht wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/ --- (Updated July 22, 2015, 9:41 a.m.) Review request for mesos and Joerg Schad. Bugs: MESOS-3127 https://issues.apache.org/jira/browse/MESOS-3127 Repository: mesos Description --- Improved task reconciliation documentation. Diffs - docs/reconciliation.md 17537ba8420c95d833e64ccf82ff9bb4530497f0 Diff: https://reviews.apache.org/r/36617/diff/ Testing --- https://gist.github.com/nfnt/73532d62fe39d27ff33d Thanks, Jan Schlicht
Re: Review Request 36617: Improved task reconciliation documentation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/#review92554 --- Thank you!! No need to update, I will get this committed for you soon with the adjustments from the feedback (I'll also give it a pass). - Ben Mahler On July 21, 2015, 9:29 a.m., Jan Schlicht wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/ --- (Updated July 21, 2015, 9:29 a.m.) Review request for mesos and Joerg Schad. Repository: mesos Description --- Improved task reconciliation documentation. Diffs - docs/reconciliation.md 17537ba8420c95d833e64ccf82ff9bb4530497f0 Diff: https://reviews.apache.org/r/36617/diff/ Testing --- https://gist.github.com/nfnt/73532d62fe39d27ff33d Thanks, Jan Schlicht
Re: Review Request 36617: Improved task reconciliation documentation.
On July 21, 2015, 7:28 p.m., Vinod Kone wrote: docs/reconciliation.md, line 96 https://reviews.apache.org/r/36617/diff/2/?file=1016806#file1016806line96 what about other terminal states? Of course! I had only the TASK_LOST send during explicit reconciliation in mind. Other terminal states are now mentioned as well. On July 21, 2015, 7:28 p.m., Vinod Kone wrote: docs/reconciliation.md, line 98 https://reviews.apache.org/r/36617/diff/2/?file=1016806#file1016806line98 what does this mean? s/Let/Ask/ ? It means to kill the task using `SchedulerDriver::killTask`. Another option might be to try to add this task to the scheduler's internal state. But this will probably only work for trivial schedulers as more informations than the task state reported from the master are neccessary to represent a task in the scheduler. - Jan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/#review92436 --- On July 22, 2015, 11:14 a.m., Jan Schlicht wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/ --- (Updated July 22, 2015, 11:14 a.m.) Review request for mesos and Joerg Schad. Repository: mesos Description --- Improved task reconciliation documentation. Diffs - 3rdparty/libprocess/3rdparty/CMakeLists.txt fc6125edb66142d26758ebf3feeb6eb68f65620f 3rdparty/libprocess/CMakeLists.txt 6bc5a687fbf78327a16d34ee0bddac40d0020f70 3rdparty/libprocess/cmake/ProcessConfigure.cmake cb5fd1d2cdbffaa65a7951fe6b74c035dc30ca71 3rdparty/libprocess/cmake/ProcessTestsConfigure.cmake d349d2e1432cd4317bed17eb9580b416b7e1c766 3rdparty/libprocess/cmake/macros/External.cmake e3901b67048f1c028216ae8323ee1c318a46f3cc 3rdparty/libprocess/cmake/macros/PatchCommand.cmake 12ee3f177a490c9b2319ed865aa3bb73019dad0d 3rdparty/libprocess/include/process/io.hpp 975923f40f82357f31b89428f24d01df6a8ac9fc 3rdparty/libprocess/src/CMakeLists.txt 9d1b1f54d63b3c4740550c2b7934200c54b48021 3rdparty/libprocess/src/tests/CMakeLists.txt 56b1861d6623b720ed603c6de64562f9a787d5d8 CMakeLists.txt 3b6f4af337466d33cb915959a5995e4307b27be3 cmake/MesosConfigure.cmake 16b72f1aa135666174cf10d73d52443d51529c6c docs/reconciliation.md 17537ba8420c95d833e64ccf82ff9bb4530497f0 include/mesos/hook.hpp bb5a635dcf189e1023f1eec66fc06955f816fc0b include/mesos/mesos.proto bcb38d934c7f223b9a23746b273e581e0e8da886 include/mesos/slave/isolator.hpp 85e38f5e4aa66527f1756fa259b93389f45028b3 src/Makefile.am 489ddb424b342635c3dbc4d14ff5d69ce76a237b src/common/http.cpp a74c51d9392d0b0a67d51a0143eb314cfca11245 src/common/protobuf_utils.hpp 2e827a0923de83d5cf853a12435b451cc7c55891 src/common/protobuf_utils.cpp e0f82b53f5e106bbf4e21d6ac946df0fae821882 src/docker/docker.hpp 38e5299ad38b9e20501387f2193b0fa448e49e3e src/docker/docker.cpp 1367de8a7bbbda6348a30e4ef4c616378e450250 src/docker/executor.cpp 256d53d59d5cda63bbeb8c987ce0019e24b9fb77 src/examples/test_hook_module.cpp c664b565bcf18dd2153205990119cc679e4ad6cf src/hook/manager.hpp 8153ce4826f94d5771c93d37c59fdc4991352e66 src/hook/manager.cpp 11e6b0a2c0df1d0d7039aaad94e1c6f0e5cc6bc2 src/linux/cgroups.hpp a651f3434b908b54d217117933740d52dbe50adf src/linux/cgroups.cpp e062fcbd56315f11882fe0ccb615c490dd719934 src/slave/containerizer/isolators/filesystem/posix.hpp 16ba26f4f5b515acbeb3c4d514d4eecf2f277df8 src/slave/containerizer/isolators/filesystem/posix.cpp 1904279c92ef00ef931c909b4bb15bef89a4fc59 src/slave/containerizer/mesos/containerizer.hpp f6c580d1b629ee799977cc8824f337764d893c5f src/slave/containerizer/mesos/containerizer.cpp 609620c4322e41562597ee682b311cd320bca6d2 src/slave/containerizer/provisioner.hpp f7fb068ca5b0a8da1fb756411d59536ed7a1aec8 src/slave/containerizer/provisioner.cpp df52e36b23ad3cd28f50e96865d0b163cc245cb2 src/slave/flags.hpp 881d494c06fea5c382d27b357d65c1baf201ae46 src/slave/flags.cpp 82b6cf47af26f0533ff603a67240777e9a9b986e src/slave/paths.hpp c7f85f188d9bd4c8d3dc194adff0cf9065fc400a src/slave/paths.cpp 404c2143e70771747d356b15eac9137495fd9a75 src/slave/slave.cpp dc12c45516ab39d74a5c29b657f22f74d0acf24e src/slave/state.hpp 4e00468a777145e3c61b8dee7dfe496f8d65b0e4 src/slave/state.cpp b9f2d8a0d6395b92bd50f1e0927f3ae9fd04b81c src/tests/cgroups_tests.cpp b63d956b9dafb2c485080ff5e016e2a05f03db15 src/tests/containerizer_tests.cpp 29114e7322b9239fb3e5f4921f542bd991fd426e src/tests/docker_containerizer_tests.cpp 5086af376e3f22726328fdb9618307fa9e84d6f8 src/tests/hook_tests.cpp 86e53d8d4609c483b676cef471512dc53f595dd0
Re: Review Request 36617: Improved task reconciliation documentation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/ --- (Updated July 22, 2015, 11:14 a.m.) Review request for mesos and Joerg Schad. Repository: mesos Description --- Improved task reconciliation documentation. Diffs (updated) - 3rdparty/libprocess/3rdparty/CMakeLists.txt fc6125edb66142d26758ebf3feeb6eb68f65620f 3rdparty/libprocess/CMakeLists.txt 6bc5a687fbf78327a16d34ee0bddac40d0020f70 3rdparty/libprocess/cmake/ProcessConfigure.cmake cb5fd1d2cdbffaa65a7951fe6b74c035dc30ca71 3rdparty/libprocess/cmake/ProcessTestsConfigure.cmake d349d2e1432cd4317bed17eb9580b416b7e1c766 3rdparty/libprocess/cmake/macros/External.cmake e3901b67048f1c028216ae8323ee1c318a46f3cc 3rdparty/libprocess/cmake/macros/PatchCommand.cmake 12ee3f177a490c9b2319ed865aa3bb73019dad0d 3rdparty/libprocess/include/process/io.hpp 975923f40f82357f31b89428f24d01df6a8ac9fc 3rdparty/libprocess/src/CMakeLists.txt 9d1b1f54d63b3c4740550c2b7934200c54b48021 3rdparty/libprocess/src/tests/CMakeLists.txt 56b1861d6623b720ed603c6de64562f9a787d5d8 CMakeLists.txt 3b6f4af337466d33cb915959a5995e4307b27be3 cmake/MesosConfigure.cmake 16b72f1aa135666174cf10d73d52443d51529c6c docs/reconciliation.md 17537ba8420c95d833e64ccf82ff9bb4530497f0 include/mesos/hook.hpp bb5a635dcf189e1023f1eec66fc06955f816fc0b include/mesos/mesos.proto bcb38d934c7f223b9a23746b273e581e0e8da886 include/mesos/slave/isolator.hpp 85e38f5e4aa66527f1756fa259b93389f45028b3 src/Makefile.am 489ddb424b342635c3dbc4d14ff5d69ce76a237b src/common/http.cpp a74c51d9392d0b0a67d51a0143eb314cfca11245 src/common/protobuf_utils.hpp 2e827a0923de83d5cf853a12435b451cc7c55891 src/common/protobuf_utils.cpp e0f82b53f5e106bbf4e21d6ac946df0fae821882 src/docker/docker.hpp 38e5299ad38b9e20501387f2193b0fa448e49e3e src/docker/docker.cpp 1367de8a7bbbda6348a30e4ef4c616378e450250 src/docker/executor.cpp 256d53d59d5cda63bbeb8c987ce0019e24b9fb77 src/examples/test_hook_module.cpp c664b565bcf18dd2153205990119cc679e4ad6cf src/hook/manager.hpp 8153ce4826f94d5771c93d37c59fdc4991352e66 src/hook/manager.cpp 11e6b0a2c0df1d0d7039aaad94e1c6f0e5cc6bc2 src/linux/cgroups.hpp a651f3434b908b54d217117933740d52dbe50adf src/linux/cgroups.cpp e062fcbd56315f11882fe0ccb615c490dd719934 src/slave/containerizer/isolators/filesystem/posix.hpp 16ba26f4f5b515acbeb3c4d514d4eecf2f277df8 src/slave/containerizer/isolators/filesystem/posix.cpp 1904279c92ef00ef931c909b4bb15bef89a4fc59 src/slave/containerizer/mesos/containerizer.hpp f6c580d1b629ee799977cc8824f337764d893c5f src/slave/containerizer/mesos/containerizer.cpp 609620c4322e41562597ee682b311cd320bca6d2 src/slave/containerizer/provisioner.hpp f7fb068ca5b0a8da1fb756411d59536ed7a1aec8 src/slave/containerizer/provisioner.cpp df52e36b23ad3cd28f50e96865d0b163cc245cb2 src/slave/flags.hpp 881d494c06fea5c382d27b357d65c1baf201ae46 src/slave/flags.cpp 82b6cf47af26f0533ff603a67240777e9a9b986e src/slave/paths.hpp c7f85f188d9bd4c8d3dc194adff0cf9065fc400a src/slave/paths.cpp 404c2143e70771747d356b15eac9137495fd9a75 src/slave/slave.cpp dc12c45516ab39d74a5c29b657f22f74d0acf24e src/slave/state.hpp 4e00468a777145e3c61b8dee7dfe496f8d65b0e4 src/slave/state.cpp b9f2d8a0d6395b92bd50f1e0927f3ae9fd04b81c src/tests/cgroups_tests.cpp b63d956b9dafb2c485080ff5e016e2a05f03db15 src/tests/containerizer_tests.cpp 29114e7322b9239fb3e5f4921f542bd991fd426e src/tests/docker_containerizer_tests.cpp 5086af376e3f22726328fdb9618307fa9e84d6f8 src/tests/hook_tests.cpp 86e53d8d4609c483b676cef471512dc53f595dd0 src/tests/master_tests.cpp 8b8d3865ee1baf03d013a1357bfdd3088828e799 src/tests/mesos.hpp 69134e1c2664ca24a1ecd80a662c841311104a6a src/tests/mesos.cpp f09ef0f99573716de8905f49dcc0c9df20e31ea9 src/tests/slave_tests.cpp e1390ad84b0003052681600deb9ca518defc0970 Diff: https://reviews.apache.org/r/36617/diff/ Testing --- https://gist.github.com/nfnt/73532d62fe39d27ff33d Thanks, Jan Schlicht
Re: Review Request 36617: Improved task reconciliation documentation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/#review92582 --- Patch looks great! Reviews applied: [36617] All tests passed. - Mesos ReviewBot On July 22, 2015, 9:41 a.m., Jan Schlicht wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/ --- (Updated July 22, 2015, 9:41 a.m.) Review request for mesos and Joerg Schad. Bugs: MESOS-3127 https://issues.apache.org/jira/browse/MESOS-3127 Repository: mesos Description --- Improved task reconciliation documentation. Diffs - docs/reconciliation.md 17537ba8420c95d833e64ccf82ff9bb4530497f0 Diff: https://reviews.apache.org/r/36617/diff/ Testing --- https://gist.github.com/nfnt/73532d62fe39d27ff33d Thanks, Jan Schlicht
Re: Review Request 36617: Improved task reconciliation documentation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/#review92436 --- can you add @bmahler to the review? docs/reconciliation.md (line 45) https://reviews.apache.org/r/36617/#comment146601 Master doesn't check if the task states match, it just reponds with the latest state of the tasks. docs/reconciliation.md (line 83) https://reviews.apache.org/r/36617/#comment146602 s/just restarted/restarts/ docs/reconciliation.md (line 87) https://reviews.apache.org/r/36617/#comment146603 Also add a blurb about explicit reconciliation of tasks that belong to transitionary slaves. docs/reconciliation.md (line 96) https://reviews.apache.org/r/36617/#comment146604 what about other terminal states? docs/reconciliation.md (line 98) https://reviews.apache.org/r/36617/#comment146605 what does this mean? s/Let/Ask/ ? - Vinod Kone On July 21, 2015, 9:29 a.m., Jan Schlicht wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/ --- (Updated July 21, 2015, 9:29 a.m.) Review request for mesos and Joerg Schad. Repository: mesos Description --- Improved task reconciliation documentation. Diffs - docs/reconciliation.md 17537ba8420c95d833e64ccf82ff9bb4530497f0 Diff: https://reviews.apache.org/r/36617/diff/ Testing --- https://gist.github.com/nfnt/73532d62fe39d27ff33d Thanks, Jan Schlicht
Re: Review Request 36617: Improved task reconciliation documentation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/#review92395 --- Patch looks great! Reviews applied: [36617] All tests passed. - Mesos ReviewBot On July 21, 2015, 9:29 a.m., Jan Schlicht wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/ --- (Updated July 21, 2015, 9:29 a.m.) Review request for mesos and Joerg Schad. Repository: mesos Description --- Improved task reconciliation documentation. Diffs - docs/reconciliation.md 17537ba8420c95d833e64ccf82ff9bb4530497f0 Diff: https://reviews.apache.org/r/36617/diff/ Testing --- https://gist.github.com/nfnt/73532d62fe39d27ff33d Thanks, Jan Schlicht
Re: Review Request 36617: Improved task reconciliation documentation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/ --- (Updated July 21, 2015, 11:29 a.m.) Review request for mesos and Joerg Schad. Repository: mesos Description (updated) --- Improved task reconciliation documentation. Diffs - docs/reconciliation.md 17537ba8420c95d833e64ccf82ff9bb4530497f0 Diff: https://reviews.apache.org/r/36617/diff/ Testing --- https://gist.github.com/nfnt/73532d62fe39d27ff33d Thanks, Jan Schlicht
Re: Review Request 36617: Improved task reconciliation documentation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36617/ --- (Updated July 20, 2015, 4:21 p.m.) Review request for mesos and Joerg Schad. Repository: mesos Description --- Improved task reconciliation documentation. Diffs (updated) - docs/reconciliation.md 17537ba8420c95d833e64ccf82ff9bb4530497f0 Diff: https://reviews.apache.org/r/36617/diff/ Testing --- https://gist.github.com/nfnt/73532d62fe39d27ff33d Thanks, Jan Schlicht