Re: Review Request 33372: Added decorator documentation and described the semantic change in Mesos 0.23.0
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33372/#review81133 --- This doc looks great now, so let's update upgrades.md too. docs/modules.md https://reviews.apache.org/r/33372/#comment131393 I still think we need a note in upgrades.md since this is a hook API change when upgrading. - Adam B On April 21, 2015, 1:13 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33372/ --- (Updated April 21, 2015, 1:13 p.m.) Review request for mesos and Adam B. Bugs: MESOS-2622 https://issues.apache.org/jira/browse/MESOS-2622 Repository: mesos Description --- See summary. Diffs - docs/modules.md a8b471541cdfa584eeb89fbe96643f93c712cfd4 Diff: https://reviews.apache.org/r/33372/diff/ Testing --- Thanks, Niklas Nielsen
Re: Review Request 33372: Added decorator documentation and described the semantic change in Mesos 0.23.0
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33372/#review80858 --- I like the further explanation of decorators (but wanted to suggest different wording); it not only sets up the context for the API change, but also further explains how hooks can be used. Can you also add a brief note (probably referencing the modules doc) in docs/upgrades.md for 0.23? docs/modules.md https://reviews.apache.org/r/33372/#comment130982 s/let's/lets/ docs/modules.md https://reviews.apache.org/r/33372/#comment130984 s/masterLaunchTaskHooks/masterLaunchTaskHook/ ? docs/modules.md https://reviews.apache.org/r/33372/#comment130985 Some hooks take in an object (e.g. TaskInfo) and return all or part of that object (e.g. task labels), so that the hook can modify or replace the contents in-flight. These hooks are referred to as _decorators_. docs/modules.md https://reviews.apache.org/r/33372/#comment130990 You've got room, so let's use explicit version numbers in the table headers. Before (0.22.x) // we didn't have these in 0.21 right? After (0.23.0+) - Adam B On April 20, 2015, 2:27 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33372/ --- (Updated April 20, 2015, 2:27 p.m.) Review request for mesos and Adam B. Bugs: MESOS-2622 https://issues.apache.org/jira/browse/MESOS-2622 Repository: mesos Description --- See summary. Diffs - docs/modules.md a8b471541cdfa584eeb89fbe96643f93c712cfd4 Diff: https://reviews.apache.org/r/33372/diff/ Testing --- Thanks, Niklas Nielsen
Re: Review Request 31016: Added slave run task decorator.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31016/#review80820 --- Ship it! Ship It! - Adam B On April 20, 2015, 1:44 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31016/ --- (Updated April 20, 2015, 1:44 p.m.) Review request for mesos, Ben Mahler and Kapil Arya. Bugs: MESOS-2351 https://issues.apache.org/jira/browse/MESOS-2351 Repository: mesos Description --- Added decorator which gets invoked on start of runTask() sequence in the slave. Diffs - include/mesos/hook.hpp 9ae8b9455a86c7a5cbf4f1d1b1ce88f2811ce35d src/hook/manager.hpp da813492108974a7e26b366845368517589da876 src/hook/manager.cpp 7a4cb09bc221af502e867cfb7fff2900b599ff1f src/slave/slave.hpp 9495c704ca4bde4ab283d12efa3ea9b2f1158a4c src/slave/slave.cpp 8ec80ed26f338690e0a1e712065750ab77a724cd src/tests/mesos.hpp 7744df55a2a31446327da7bd2b16457e90711d22 src/tests/mesos.cpp fc534e9febed1e293076e00e0f5c3879a78df90f Diff: https://reviews.apache.org/r/31016/diff/ Testing --- make check Thanks, Niklas Nielsen
Re: Review Request 31028: Added slave run task hook tests.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31028/#review80822 --- Ship it! Looks great! src/tests/hook_tests.cpp https://reviews.apache.org/r/31028/#comment130932 Would be great to see a diagram of these labels coming and going at different points in the runTask lifecycle. It's a little confusing keeping track of what happens when. New documentation JIRA? - Adam B On April 20, 2015, 1:17 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31028/ --- (Updated April 20, 2015, 1:17 p.m.) Review request for mesos, Ben Mahler and Kapil Arya. Bugs: MESOS-2351 https://issues.apache.org/jira/browse/MESOS-2351 Repository: mesos Description --- See summary Diffs - src/examples/test_hook_module.cpp 2f2da1c5ef85af06c7f366d38ce5b64f39d0076f src/tests/hook_tests.cpp bb9de25bd2c4601d333a3ca1aec13820c7df7378 Diff: https://reviews.apache.org/r/31028/diff/ Testing --- make check (with newly added VerifySlaveRunTaskHook test) Thanks, Niklas Nielsen
Re: Review Request 31017: Fixed comment for remove executor hook.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31017/#review80829 --- Ship it! Ship It! - Adam B On April 20, 2015, 1:17 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31017/ --- (Updated April 20, 2015, 1:17 p.m.) Review request for mesos and Kapil Arya. Bugs: MESOS-2351 https://issues.apache.org/jira/browse/MESOS-2351 Repository: mesos Description --- See summary. Diffs - include/mesos/hook.hpp 9ae8b9455a86c7a5cbf4f1d1b1ce88f2811ce35d Diff: https://reviews.apache.org/r/31017/diff/ Testing --- make check Thanks, Niklas Nielsen
Re: Review Request 32948: Refactored VerifyMasterLaunchTaskHook to _not_ use command executor.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32948/#review80826 --- Ship it! Looks good. Just a couple of minor reorderings and a WillRepeatedly to fix. Fix it, then Ship It! src/tests/hook_tests.cpp https://reviews.apache.org/r/32948/#comment130933 registered() should be called before resourceOffers(), so let's set up the expectations in that order. src/tests/hook_tests.cpp https://reviews.apache.org/r/32948/#comment130935 .WillOnce(FutureArg1(status)) .WillRepeatedly(Return()); // ignore future updates src/tests/hook_tests.cpp https://reviews.apache.org/r/32948/#comment130937 This should happen before the exec.registered, exec.launchTask, and sched.statusUpdate, so let's set up the expectation/future before them. - Adam B On April 20, 2015, 1:17 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32948/ --- (Updated April 20, 2015, 1:17 p.m.) Review request for mesos, Adam B and Kapil Arya. Bugs: MESOS-2351 https://issues.apache.org/jira/browse/MESOS-2351 Repository: mesos Description --- See summary Diffs - src/tests/hook_tests.cpp bb9de25bd2c4601d333a3ca1aec13820c7df7378 Diff: https://reviews.apache.org/r/32948/diff/ Testing --- make check (test broke previously with an assert as the TestContainerizer cannot be used with a command executor) Thanks, Niklas Nielsen
Re: Review Request 32975: MESOS-1790 Adds chown option to CommandInfo.URI
On April 13, 2015, 10:20 a.m., Vinod Kone wrote: src/tests/fetcher_tests.cpp, line 708 https://reviews.apache.org/r/32975/diff/3/?file=925449#file925449line708 s/archive/archived/ ? Jim Klucar wrote: fixed as suggested Jim, you claim that you've fixed these issues, but I don't see an updated diff with those changes. Please update. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32975/#review79898 --- On April 13, 2015, 9:55 a.m., Jim Klucar wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32975/ --- (Updated April 13, 2015, 9:55 a.m.) Review request for mesos. Bugs: MESOS-1790 https://issues.apache.org/jira/browse/MESOS-1790 Repository: mesos Description --- Added chown to CommandInfo.URI protocol buffer as an optional boolean that defaults to true, the current chown behavior. The fetcher was updated to skip the os::chown operation if the chown boolean is set to false. No documentation was updated. Diffs - include/mesos/mesos.proto 3c592d5ab3092ecbeddfaff95e0c1addc3ac58f8 src/common/type_utils.cpp e92f6f36de0955784619029a016667b46bbe221b src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e Diff: https://reviews.apache.org/r/32975/diff/ Testing --- Unit testing this functionality is difficult because it would require that the user running the test to have permission to chown a file to someone other than themselves. I didn't want to add that as a requirement to build. I added the new field to the existing test cases just to see that they populate. Thanks, Jim Klucar
Re: Review Request 33109: Allow setting environment variables in mesos-execute
On April 19, 2015, 11:27 p.m., Adam B wrote: src/cli/execute.cpp, line 77 https://reviews.apache.org/r/33109/diff/2/?file=924729#file924729line77 Any reason not to just name this 'env' or 'environment'? haosdent huang wrote: I afraid environment/env maybe have other usage in the future. But if you think environment is ok, I would change it here. Should be fine. 'mesos-execute' is very specific to launching a single command. On April 19, 2015, 11:27 p.m., Adam B wrote: src/cli/execute.cpp, line 396 https://reviews.apache.org/r/33109/diff/2/?file=924729#file924729line396 How is this going to work with environment variables like PATH that expect :'s inside their values? Probably need to choose another delimiter (even `;` and `=` could be tricky), or pass in a newline-delimited file, or go all the way to json lists. haosdent huang wrote: How about pass a json list here? SGTM. You want want to enable reading the json in from a file too, in case it gets long. See how the rest of Mesos reads in json parameters in master/slave flags. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33109/#review80660 --- On April 13, 2015, 9:42 a.m., haosdent huang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33109/ --- (Updated April 13, 2015, 9:42 a.m.) Review request for mesos and Adam B. Bugs: MESOS-2023 https://issues.apache.org/jira/browse/MESOS-2023 Repository: mesos Description --- Allow setting environment variables in mesos-execute Diffs - src/cli/execute.cpp 84f70dccbc2c5dd43f68105d967f4488c82f582b Diff: https://reviews.apache.org/r/33109/diff/ Testing --- make check Thanks, haosdent huang
Re: Review Request 32850: Moved cram-md5 authenticatee process definition into implementation file.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32850/#review80659 --- Ship it! LGTM, assuming we reach a consensus on the style issues. (Also assuming you didn't lie about changing any logic/functionality. ;) - Adam B On April 14, 2015, 5:44 p.m., Till Toenshoff wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32850/ --- (Updated April 14, 2015, 5:44 p.m.) Review request for mesos, Adam B, Joris Van Remoortere, and switched to 'mcypark'. Bugs: MESOS-2584 https://issues.apache.org/jira/browse/MESOS-2584 Repository: mesos-incubating Description --- Removing the process from the header is much cleaner and also fixes the linked clang 3.4.2 JIRA. Apart from that moving, no code is changed. Diffs - src/Makefile.am d15a373 src/authentication/cram_md5/authenticatee.hpp 55fac68 src/authentication/cram_md5/authenticatee.cpp PRE-CREATION Diff: https://reviews.apache.org/r/32850/diff/ Testing --- make check Thanks, Till Toenshoff
Re: Review Request 33109: Allow setting environment variables in mesos-execute
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33109/#review80660 --- Thanks for jumping on this. Great start, but we need to rethink the format/delimiters to account for environment variables like PATH that might contain `:` or `;`. I assume you manually tested this with mesos-execute --env? Please describe your manual test in the testing section as well. src/cli/execute.cpp https://reviews.apache.org/r/33109/#comment130789 Any reason not to just name this 'env' or 'environment'? src/cli/execute.cpp https://reviews.apache.org/r/33109/#comment130792 Please `#include stout/hashmap.hpp` src/cli/execute.cpp https://reviews.apache.org/r/33109/#comment130790 Style nit: Please bring the opening brace up to the previous line, just like the rest of the the 'if' blocks in this file. src/cli/execute.cpp https://reviews.apache.org/r/33109/#comment130791 Instead of using an index, since you're just adding new variables as you iterate through commandEnv, try `Environment_Variable* envVar = env-mutable_variables()-add_variables();` which will append a new `variable` to the list and return a pointer to you. src/cli/execute.cpp https://reviews.apache.org/r/33109/#comment130793 How is this going to work with environment variables like PATH that expect :'s inside their values? Probably need to choose another delimiter (even `;` and `=` could be tricky), or pass in a newline-delimited file, or go all the way to json lists. - Adam B On April 13, 2015, 9:42 a.m., haosdent huang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33109/ --- (Updated April 13, 2015, 9:42 a.m.) Review request for mesos and Adam B. Bugs: MESOS-2023 https://issues.apache.org/jira/browse/MESOS-2023 Repository: mesos Description --- Allow setting environment variables in mesos-execute Diffs - src/cli/execute.cpp 84f70dccbc2c5dd43f68105d967f4488c82f582b Diff: https://reviews.apache.org/r/33109/diff/ Testing --- make check Thanks, haosdent huang
Re: Review Request 32693: Change Http Request log level to VLOG(1)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32693/#review80665 --- Ship it! Thanks! I'll try to get this committed tomorrow (assuming nobody objects in the meantime). - Adam B On April 19, 2015, 11:53 p.m., haosdent huang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32693/ --- (Updated April 19, 2015, 11:53 p.m.) Review request for mesos. Bugs: MESOS-2498 https://issues.apache.org/jira/browse/MESOS-2498 Repository: mesos Description --- Change Http Request log level to VLOG(1) Diffs - src/master/http.cpp 00c22c43bd1f6cef7963b2ffa9c095c6cbd01cd3 src/slave/http.cpp 914e7e5d0d6f5f628ba3c02e0dc23629d0a23512 Diff: https://reviews.apache.org/r/32693/diff/ Testing --- make check Thanks, haosdent huang
Re: Review Request 31028: Added slave run task hook tests.
On April 11, 2015, 3:54 a.m., Adam B wrote: src/examples/test_hook_module.cpp, lines 80-85 https://reviews.apache.org/r/31028/diff/4/?file=920382#file920382line80 Create variables like testLabelKey, etc. above so it's easier to track all these label k/v strings. Niklas Nielsen wrote: I would prefer if we could defer this to a subsequent review; we use foo, bar, baz, qux quite a few places for label testing. Is that OK with you? I can create a JIRA for it now if you'd like. Separate review/JIRA is fine by me. I just have an aversion to hardcoded strings showing up in multiple places. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31028/#review79804 --- On April 11, 2015, 3:03 a.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31028/ --- (Updated April 11, 2015, 3:03 a.m.) Review request for mesos, Ben Mahler and Kapil Arya. Bugs: MESOS-2351 https://issues.apache.org/jira/browse/MESOS-2351 Repository: mesos Description --- See summary Diffs - src/examples/test_hook_module.cpp 47409cd4d02e238d1d182571d92019114662cd41 src/tests/hook_tests.cpp bb9de25bd2c4601d333a3ca1aec13820c7df7378 Diff: https://reviews.apache.org/r/31028/diff/ Testing --- make check (with newly added VerifySlaveRunTaskHook test) Thanks, Niklas Nielsen
Re: Review Request 31016: Added slave run task decorator.
On April 11, 2015, 3:26 a.m., Adam B wrote: src/slave/slave.cpp, lines 1186-1188 https://reviews.apache.org/r/31016/diff/4/?file=920381#file920381line1186 What makes this the ideal place to do the label decoration? Looks like this is wedged between setting up different unschedule calls. Seems like we should either decorate the labels as early as possible, right after (or before?) the pid/id/state checks; or, if we need to delay it, we could do it after the unschedules. Niklas Nielsen wrote: Great point - I will probably move it up before the unschedule code. Does that sound OK to you? Yeah, I'm thinking earlier is better, so we don't early-exit without updating labels. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31016/#review79803 --- On April 11, 2015, 3:03 a.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31016/ --- (Updated April 11, 2015, 3:03 a.m.) Review request for mesos, Ben Mahler and Kapil Arya. Bugs: MESOS-2351 https://issues.apache.org/jira/browse/MESOS-2351 Repository: mesos Description --- Added decorator which gets invoked on start of runTask() sequence in the slave. Diffs - include/mesos/hook.hpp 9ae8b9455a86c7a5cbf4f1d1b1ce88f2811ce35d src/hook/manager.hpp da813492108974a7e26b366845368517589da876 src/hook/manager.cpp 7a4cb09bc221af502e867cfb7fff2900b599ff1f src/slave/slave.hpp 19e6b44bc344c0ca509674803f401cbb4e1f47ae src/slave/slave.cpp 9fec023b643d410f4d511fa6f80e9835bab95b7e Diff: https://reviews.apache.org/r/31016/diff/ Testing --- make check Thanks, Niklas Nielsen
Re: Review Request 33208: Delete detector in MesosSchedulerDriver::stop
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33208/#review80655 --- LGTM, barring a question about ordering/synchronization. I'll let another committer take a look before we commit it. src/sched/sched.cpp https://reviews.apache.org/r/33208/#comment130787 I wonder if we need to wait for SchedulerProcess::stop to complete before deleting the detector, or if we could even delete the detector first. - Adam B On April 14, 2015, 10:23 p.m., Robert Lacroix wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33208/ --- (Updated April 14, 2015, 10:23 p.m.) Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone. Bugs: MESOS-1634 https://issues.apache.org/jira/browse/MESOS-1634 Repository: mesos Description --- When the Mesos Scheduler Driver stops, it does not delete the detector process before the object is garbage collected, which leaves ZooKeeper connections hanging around unnecessarily. This deletes the process on stop as well, not only on destruction. Diffs - src/sched/sched.cpp 66fd2b3146568900356cc48e0f17c6720665ae80 Diff: https://reviews.apache.org/r/33208/diff/ Testing --- make check, manual testing Thanks, Robert Lacroix
Re: Review Request 30961: Enabled label decorator to override.
On April 8, 2015, 5:31 p.m., Adam B wrote: src/examples/test_hook_module.cpp, line 36 https://reviews.apache.org/r/30961/diff/7/?file=920371#file920371line36 Unused? Or should you check that the value being removed is what you expect? Niklas Nielsen wrote: It was just to keep the module and test code in sync. Would you prefer I remove it? Yeah, when it comes to variables and constants, I say use it or lose it. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30961/#review79454 --- On April 7, 2015, 5:57 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30961/ --- (Updated April 7, 2015, 5:57 p.m.) Review request for mesos, Ben Mahler and Kapil Arya. Bugs: MESOS-2351 https://issues.apache.org/jira/browse/MESOS-2351 Repository: mesos Description --- See summary. Diffs - include/mesos/hook.hpp 9ae8b9455a86c7a5cbf4f1d1b1ce88f2811ce35d src/examples/test_hook_module.cpp 47409cd4d02e238d1d182571d92019114662cd41 src/hook/manager.cpp 7a4cb09bc221af502e867cfb7fff2900b599ff1f src/master/master.cpp 618db68ee4163b06e479cf3413eda4b63c9c5a4b src/tests/hook_tests.cpp bb9de25bd2c4601d333a3ca1aec13820c7df7378 Diff: https://reviews.apache.org/r/30961/diff/ Testing --- make check (with modified VerifyMasterLaunchTaskHook test) Thanks, Niklas Nielsen
Re: Review Request 33208: Delete detector in MesosSchedulerDriver::stop
On April 19, 2015, 10:55 p.m., Adam B wrote: LGTM, barring a question about ordering/synchronization. I'll let another committer take a look before we commit it. Would also like to see a successful ReviewBot pass. That MasterFailover segfault seems like it could be related to your detector changes. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33208/#review80655 --- On April 14, 2015, 10:23 p.m., Robert Lacroix wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33208/ --- (Updated April 14, 2015, 10:23 p.m.) Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone. Bugs: MESOS-1634 https://issues.apache.org/jira/browse/MESOS-1634 Repository: mesos Description --- When the Mesos Scheduler Driver stops, it does not delete the detector process before the object is garbage collected, which leaves ZooKeeper connections hanging around unnecessarily. This deletes the process on stop as well, not only on destruction. Diffs - src/sched/sched.cpp 66fd2b3146568900356cc48e0f17c6720665ae80 Diff: https://reviews.apache.org/r/33208/diff/ Testing --- make check, manual testing Thanks, Robert Lacroix
Re: Review Request 33109: Allow setting environment variables in mesos-execute
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33109/#review79925 --- Thanks for the patch! I'm a little busy at ApacheCon this week, but will try to review asap. Maybe others can make a pass sooner. - Adam B On April 13, 2015, 9:42 a.m., haosdent huang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33109/ --- (Updated April 13, 2015, 9:42 a.m.) Review request for mesos and Adam B. Bugs: MESOS-2023 https://issues.apache.org/jira/browse/MESOS-2023 Repository: mesos Description --- Allow setting environment variables in mesos-execute Diffs - src/cli/execute.cpp 84f70dccbc2c5dd43f68105d967f4488c82f582b Diff: https://reviews.apache.org/r/33109/diff/ Testing --- make check Thanks, haosdent huang
Re: Review Request 32586: Removed FrameworkID argument from Slave::_runTask.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32586/#review79800 --- Ship it! Ship It! - Adam B On April 7, 2015, 9:59 a.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32586/ --- (Updated April 7, 2015, 9:59 a.m.) Review request for mesos, Adam B and Niklas Nielsen. Bugs: MESOS-2557 https://issues.apache.org/jira/browse/MESOS-2557 Repository: mesos Description --- The incoming FrameworkInfo has a valid FrameworkID. Diffs - src/slave/slave.hpp 19e6b44bc344c0ca509674803f401cbb4e1f47ae src/slave/slave.cpp c7e65a6c095963feaa9d5fdbb519c68f8f761d16 src/tests/mesos.hpp 45e35204d1aa876fa0c871acf0f21afcd5ababe8 src/tests/mesos.cpp 11e88330dd50913ab00da79054225d113541e83e src/tests/slave_tests.cpp fd09d65bf34136c0959419b451e54105300740c4 Diff: https://reviews.apache.org/r/32586/diff/ Testing --- make check TODO: Test for upgrade path. Thanks, Kapil Arya
Re: Review Request 32583: Marked RunTaskMessage::framework_id as optional.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32583/#review79798 --- Ship it! Did you ever (manually?) Test for upgrade path? - Adam B On April 7, 2015, 9:59 a.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32583/ --- (Updated April 7, 2015, 9:59 a.m.) Review request for mesos, Adam B and Niklas Nielsen. Bugs: MESOS-2558 https://issues.apache.org/jira/browse/MESOS-2558 Repository: mesos Description --- The new preferred location for FrameworkID is FrameworkInfo.id. This patchset achieves this goal by incrementally deprecating other locations for FrameworkID. Here is a plan to deal with the upgrade path: For this release (N), we still keep setting RunTaskMessage::framework_id - this would handle older Slaves with newer Master. - added code to handle it being unset in the Slave (handles older Master with newer Slaves). In the following release (N+1), stop reading/setting RunTaskMessage::framework_id - the previous version would handle the unset case. In release N+2, remove the field altogether: - the previous release is not setting/reading it. Diffs - src/messages/messages.proto 97c45c01dfcea38b1ae555c036d61e10c152c2c8 src/slave/slave.cpp c7e65a6c095963feaa9d5fdbb519c68f8f761d16 Diff: https://reviews.apache.org/r/32583/diff/ Testing --- make check. TODO: Test for upgrade path. Thanks, Kapil Arya
Re: Review Request 32585: Replaced Framework.id with Framework.id() in Master/Slave.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32585/#review79799 --- Ship it! Needs a quick rebase before committing. - Adam B On April 7, 2015, 9:59 a.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32585/ --- (Updated April 7, 2015, 9:59 a.m.) Review request for mesos, Adam B and Niklas Nielsen. Bugs: MESOS-2557 https://issues.apache.org/jira/browse/MESOS-2557 Repository: mesos Description --- Framework.id() extracts and returns FrameworkID from Framework.info. Diffs - src/master/http.cpp e1a87d646e9690e39a9e84ae383622018ce80401 src/master/master.hpp 3c957abcb54a0c23b8549c1d21d2d9277791938d src/master/master.cpp dccd7c635da4b7031cd109bd84e7f17b31777ef1 src/master/validation.cpp 2f2e4ea8ea123c5a0d01446cdec8b308ea60932e src/slave/http.cpp 5f0c39afd2fe9a89eb1df0052af8ab422306f30e src/slave/slave.hpp 19e6b44bc344c0ca509674803f401cbb4e1f47ae src/slave/slave.cpp c7e65a6c095963feaa9d5fdbb519c68f8f761d16 Diff: https://reviews.apache.org/r/32585/diff/ Testing --- make check TODO: Test for upgrade path. Thanks, Kapil Arya
Re: Review Request 30961: Enabled label decorator to override.
On April 7, 2015, 1:41 a.m., Adam B wrote: src/hook/manager.cpp, line 104 https://reviews.apache.org/r/30961/diff/6/?file=914058#file914058line104 Would it make sense to make taskInfo a pass-by-value param, forcing the copy at the call? Niklas Nielsen wrote: That unfortunately changes the module API and needs to be updated quite a few spaces (compared to this single copy). SGTM. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30961/#review79150 --- On April 7, 2015, 5:57 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30961/ --- (Updated April 7, 2015, 5:57 p.m.) Review request for mesos, Ben Mahler and Kapil Arya. Bugs: MESOS-2351 https://issues.apache.org/jira/browse/MESOS-2351 Repository: mesos Description --- See summary. Diffs - include/mesos/hook.hpp 9ae8b9455a86c7a5cbf4f1d1b1ce88f2811ce35d src/examples/test_hook_module.cpp 47409cd4d02e238d1d182571d92019114662cd41 src/hook/manager.cpp 7a4cb09bc221af502e867cfb7fff2900b599ff1f src/master/master.cpp 618db68ee4163b06e479cf3413eda4b63c9c5a4b src/tests/hook_tests.cpp bb9de25bd2c4601d333a3ca1aec13820c7df7378 Diff: https://reviews.apache.org/r/30961/diff/ Testing --- make check (with modified VerifyMasterLaunchTaskHook test) Thanks, Niklas Nielsen
Re: Review Request 30962: Enabled environment decorator to override.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30962/#review79802 --- Ship it! Ship It! - Adam B On April 7, 2015, 5:57 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30962/ --- (Updated April 7, 2015, 5:57 p.m.) Review request for mesos, Ben Mahler and Kapil Arya. Bugs: MESOS-2351 https://issues.apache.org/jira/browse/MESOS-2351 Repository: mesos Description --- See summary Diffs - src/examples/test_hook_module.cpp 47409cd4d02e238d1d182571d92019114662cd41 src/hook/manager.cpp 7a4cb09bc221af502e867cfb7fff2900b599ff1f Diff: https://reviews.apache.org/r/30962/diff/ Testing --- make check Thanks, Niklas Nielsen
Re: Review Request 31016: Added slave run task decorator.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31016/#review79803 --- Looks good. Just a question about when we should actually do the decoration. src/slave/slave.cpp https://reviews.apache.org/r/31016/#comment129386 What makes this the ideal place to do the label decoration? Looks like this is wedged between setting up different unschedule calls. Seems like we should either decorate the labels as early as possible, right after (or before?) the pid/id/state checks; or, if we need to delay it, we could do it after the unschedules. - Adam B On April 11, 2015, 3:03 a.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31016/ --- (Updated April 11, 2015, 3:03 a.m.) Review request for mesos, Ben Mahler and Kapil Arya. Bugs: MESOS-2351 https://issues.apache.org/jira/browse/MESOS-2351 Repository: mesos Description --- Added decorator which gets invoked on start of runTask() sequence in the slave. Diffs - include/mesos/hook.hpp 9ae8b9455a86c7a5cbf4f1d1b1ce88f2811ce35d src/hook/manager.hpp da813492108974a7e26b366845368517589da876 src/hook/manager.cpp 7a4cb09bc221af502e867cfb7fff2900b599ff1f src/slave/slave.hpp 19e6b44bc344c0ca509674803f401cbb4e1f47ae src/slave/slave.cpp 9fec023b643d410f4d511fa6f80e9835bab95b7e Diff: https://reviews.apache.org/r/31016/diff/ Testing --- make check Thanks, Niklas Nielsen
Re: Review Request 31028: Added slave run task hook tests.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31028/#review79804 --- src/examples/test_hook_module.cpp https://reviews.apache.org/r/31028/#comment129387 Create variables like testLabelKey, etc. above so it's easier to track all these label k/v strings. src/tests/hook_tests.cpp https://reviews.apache.org/r/31028/#comment129388 Why this change? What's wrong with the TestContainerizer? - Adam B On April 11, 2015, 3:03 a.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31028/ --- (Updated April 11, 2015, 3:03 a.m.) Review request for mesos, Ben Mahler and Kapil Arya. Bugs: MESOS-2351 https://issues.apache.org/jira/browse/MESOS-2351 Repository: mesos Description --- See summary Diffs - src/examples/test_hook_module.cpp 47409cd4d02e238d1d182571d92019114662cd41 src/tests/hook_tests.cpp bb9de25bd2c4601d333a3ca1aec13820c7df7378 Diff: https://reviews.apache.org/r/31028/diff/ Testing --- make check (with newly added VerifySlaveRunTaskHook test) Thanks, Niklas Nielsen
Re: Review Request 32998: Split committer's guide into code reviewing and committing documents.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32998/#review79457 --- Thanks for doing this! I know most of this was just splitting the file, but I thought here/now would be a good place/time to add additional comments/feedback on the docs. docs/committing.md https://reviews.apache.org/r/32998/#comment128850 Also make sure that the patch author gets credited as such in the commit. Automatically handled by apply-review.sh, but worth mentioning in case you have to make any minor changes or rebase. docs/committing.md https://reviews.apache.org/r/32998/#comment128849 Would like to formalize what kinds of changes you don't worry about going through a review cycle. I'd propose that typo/comment/doc changes under 5 lines, or obvious build fixes are immune. Anything more complex than that deserves at least a cursory review. docs/effective-code-reviewing.md https://reviews.apache.org/r/32998/#comment128852 Keep in mind that the review Summary + Description gets used as the commit message, so don't put unnecessary fluff in there. The testing message doesn't go into the commit section, so it can be used for notes to the reviewers. Also, please provide details about the testing done and new tests added; hopefully more than just `make check`. docs/effective-code-reviewing.md https://reviews.apache.org/r/32998/#comment128854 s/scope of the work be reduced/scope of the work should be reduced/ - Adam B On April 8, 2015, 5:30 p.m., Ben Mahler wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32998/ --- (Updated April 8, 2015, 5:30 p.m.) Review request for mesos, Adam B, Benjamin Hindman, Jie Yu, Niklas Nielsen, and Vinod Kone. Bugs: MESOS-2581 https://issues.apache.org/jira/browse/MESOS-2581 Repository: mesos Description --- Committer's Guide was too generic. This names the documents after what the reader is looking for: doing effective reviews, and how to commit changes (for committers only). Diffs - docs/committers-guide.md c016ee9cb3290d7788ed258904547b59bbea4f11 docs/committing.md PRE-CREATION docs/effective-code-reviewing.md PRE-CREATION docs/home.md 6ab61f85aa7d62e0f19267b886dffb4e0aa826ea Diff: https://reviews.apache.org/r/32998/diff/ Testing --- N/A Thanks, Ben Mahler
Re: Review Request 32999: Added a document for engineering principles and practices.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32999/#review79461 --- Ship it! Love it! There are probably more values we can add, but this is a great start. docs/engineering-principles-and-practices.md https://reviews.apache.org/r/32999/#comment128857 s/allows us identify/allows us to identify/ s/allows to iterate/allows us to iterate/ - Adam B On April 8, 2015, 5:30 p.m., Ben Mahler wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32999/ --- (Updated April 8, 2015, 5:30 p.m.) Review request for mesos, Adam B, Benjamin Hindman, Jie Yu, Niklas Nielsen, and Vinod Kone. Repository: mesos Description --- Added a document for engineering principles and practices. Diffs - docs/engineering-principles-and-practices.md PRE-CREATION docs/home.md 6ab61f85aa7d62e0f19267b886dffb4e0aa826ea Diff: https://reviews.apache.org/r/32999/diff/ Testing --- N/A Thanks, Ben Mahler
Re: Review Request 30961: Enabled label decorator to override.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30961/#review79454 --- Ship it! LGTM. Where's the JIRA/patch for updating the upgrades doc to explain the module API change? src/examples/test_hook_module.cpp https://reviews.apache.org/r/30961/#comment128844 Unused? Or should you check that the value being removed is what you expect? src/examples/test_hook_module.cpp https://reviews.apache.org/r/30961/#comment128845 label shadows label? Maybe oldLabel/newLabel? Then you can also reuse the `Label*` in line 52 and 59 (`label_`) - Adam B On April 7, 2015, 5:57 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30961/ --- (Updated April 7, 2015, 5:57 p.m.) Review request for mesos, Ben Mahler and Kapil Arya. Bugs: MESOS-2351 https://issues.apache.org/jira/browse/MESOS-2351 Repository: mesos Description --- See summary. Diffs - include/mesos/hook.hpp 9ae8b9455a86c7a5cbf4f1d1b1ce88f2811ce35d src/examples/test_hook_module.cpp 47409cd4d02e238d1d182571d92019114662cd41 src/hook/manager.cpp 7a4cb09bc221af502e867cfb7fff2900b599ff1f src/master/master.cpp 618db68ee4163b06e479cf3413eda4b63c9c5a4b src/tests/hook_tests.cpp bb9de25bd2c4601d333a3ca1aec13820c7df7378 Diff: https://reviews.apache.org/r/30961/diff/ Testing --- make check (with modified VerifyMasterLaunchTaskHook test) Thanks, Niklas Nielsen
Re: Review Request 30961: Enabled label decorator to override.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30961/#review79150 --- Minor suggestions. src/hook/manager.cpp https://reviews.apache.org/r/30961/#comment128285 Would it make sense to make taskInfo a pass-by-value param, forcing the copy at the call? src/master/master.cpp https://reviews.apache.org/r/30961/#comment128286 Update comment src/tests/hook_tests.cpp https://reviews.apache.org/r/30961/#comment128287 s/remove/removed/ s/hook/the hook/ src/tests/hook_tests.cpp https://reviews.apache.org/r/30961/#comment128288 s/Value/Key/? src/tests/hook_tests.cpp https://reviews.apache.org/r/30961/#comment128289 Should also verify `EXPECT_EQ(labels_.labels().Get(0).value(), testLabelValue);` - Adam B On April 2, 2015, 2:39 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30961/ --- (Updated April 2, 2015, 2:39 p.m.) Review request for mesos, Ben Mahler and Kapil Arya. Bugs: MESOS-2351 https://issues.apache.org/jira/browse/MESOS-2351 Repository: mesos Description --- See summary. Diffs - include/mesos/hook.hpp 9ae8b9455a86c7a5cbf4f1d1b1ce88f2811ce35d src/examples/test_hook_module.cpp 47409cd4d02e238d1d182571d92019114662cd41 src/hook/manager.cpp 7a4cb09bc221af502e867cfb7fff2900b599ff1f src/master/master.cpp 618db68ee4163b06e479cf3413eda4b63c9c5a4b src/tests/hook_tests.cpp bb9de25bd2c4601d333a3ca1aec13820c7df7378 Diff: https://reviews.apache.org/r/30961/diff/ Testing --- make check (with modified VerifyMasterLaunchTaskHook test) Thanks, Niklas Nielsen
Re: Review Request 32585: Replaced Framework.id with Framework.id() in Master/Slave.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32585/#review79146 --- LGTM, assuming no significant rebases. src/master/master.hpp https://reviews.apache.org/r/32585/#comment128281 `const FrameworkID`? - Adam B On April 1, 2015, 12:34 p.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32585/ --- (Updated April 1, 2015, 12:34 p.m.) Review request for mesos, Adam B and Niklas Nielsen. Bugs: MESOS-2557 https://issues.apache.org/jira/browse/MESOS-2557 Repository: mesos Description --- Framework.id() extracts and returns FrameworkID from Framework.info. Diffs - src/master/http.cpp e1a87d646e9690e39a9e84ae383622018ce80401 src/master/master.hpp 3c957abcb54a0c23b8549c1d21d2d9277791938d src/master/master.cpp dccd7c635da4b7031cd109bd84e7f17b31777ef1 src/master/validation.cpp 2f2e4ea8ea123c5a0d01446cdec8b308ea60932e src/slave/http.cpp 5f0c39afd2fe9a89eb1df0052af8ab422306f30e src/slave/slave.hpp 19e6b44bc344c0ca509674803f401cbb4e1f47ae src/slave/slave.cpp c7e65a6c095963feaa9d5fdbb519c68f8f761d16 Diff: https://reviews.apache.org/r/32585/diff/ Testing --- make check TODO: Test for upgrade path. Thanks, Kapil Arya
Re: Review Request 30961: Enabled label decorator to override.
On March 30, 2015, 3:23 a.m., Adam B wrote: Minor cleanup/suggestions, but otherwise good. We'll definitely need to document te module(-manager) API change of overriding the label set instead of merging. This should probably go in the upgrades doc? Niklas Nielsen wrote: Thanks for the review Adam! Great point. I'll create a JIRA for it. JIRA link for the upgrades doc? - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30961/#review78189 --- On April 2, 2015, 2:39 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30961/ --- (Updated April 2, 2015, 2:39 p.m.) Review request for mesos, Ben Mahler and Kapil Arya. Bugs: MESOS-2351 https://issues.apache.org/jira/browse/MESOS-2351 Repository: mesos Description --- See summary. Diffs - include/mesos/hook.hpp 9ae8b9455a86c7a5cbf4f1d1b1ce88f2811ce35d src/examples/test_hook_module.cpp 47409cd4d02e238d1d182571d92019114662cd41 src/hook/manager.cpp 7a4cb09bc221af502e867cfb7fff2900b599ff1f src/master/master.cpp 618db68ee4163b06e479cf3413eda4b63c9c5a4b src/tests/hook_tests.cpp bb9de25bd2c4601d333a3ca1aec13820c7df7378 Diff: https://reviews.apache.org/r/30961/diff/ Testing --- make check (with modified VerifyMasterLaunchTaskHook test) Thanks, Niklas Nielsen
Re: Review Request 32583: Marked RunTaskMessage::framework_id as optional.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32583/#review79142 --- src/slave/slave.cpp https://reviews.apache.org/r/32583/#comment128278 If we're always making the copy, should we just pass `frameworkInfo_` by value and force the copy at the call-site? Then there's no need for the `_` name. src/slave/slave.cpp https://reviews.apache.org/r/32583/#comment128280 Isn't this what MergeFrom does? src/slave/slave.cpp https://reviews.apache.org/r/32583/#comment128276 Is this another instance where we're taking a const reference of a temporary? https://gist.github.com/jmlvanre/8a3de53ae88c2d19b375 - Adam B On April 3, 2015, 7:05 a.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32583/ --- (Updated April 3, 2015, 7:05 a.m.) Review request for mesos, Adam B and Niklas Nielsen. Bugs: MESOS-2558 https://issues.apache.org/jira/browse/MESOS-2558 Repository: mesos Description --- The new preferred location for FrameworkID is FrameworkInfo.id. This patchset achieves this goal by incrementally deprecating other locations for FrameworkID. Here is a plan to deal with the upgrade path: For this release (N), we still keep setting RunTaskMessage::framework_id - this would handle older Slaves with newer Master. - added code to handle it being unset in the Slave (handles older Master with newer Slaves). In the following release (N+1), stop reading/setting RunTaskMessage::framework_id - the previous version would handle the unset case. In release N+2, remove the field altogether: - the previous release is not setting/reading it. Diffs - src/messages/messages.proto 97c45c01dfcea38b1ae555c036d61e10c152c2c8 src/slave/slave.cpp c7e65a6c095963feaa9d5fdbb519c68f8f761d16 Diff: https://reviews.apache.org/r/32583/diff/ Testing --- make check. TODO: Test for upgrade path. Thanks, Kapil Arya
Re: Review Request 32586: Removed FrameworkID argument from Slave::_runTask.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32586/#review79147 --- src/slave/slave.cpp https://reviews.apache.org/r/32586/#comment128282 Const ref of a temporary? src/tests/mesos.cpp https://reviews.apache.org/r/32586/#comment128283 Why this change? - Adam B On April 1, 2015, 12:35 p.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32586/ --- (Updated April 1, 2015, 12:35 p.m.) Review request for mesos, Adam B and Niklas Nielsen. Bugs: MESOS-2557 https://issues.apache.org/jira/browse/MESOS-2557 Repository: mesos Description --- The incoming FrameworkInfo has a valid FrameworkID. Diffs - src/slave/slave.hpp 19e6b44bc344c0ca509674803f401cbb4e1f47ae src/slave/slave.cpp c7e65a6c095963feaa9d5fdbb519c68f8f761d16 src/tests/mesos.hpp 45e35204d1aa876fa0c871acf0f21afcd5ababe8 src/tests/mesos.cpp 11e88330dd50913ab00da79054225d113541e83e src/tests/slave_tests.cpp fd09d65bf34136c0959419b451e54105300740c4 Diff: https://reviews.apache.org/r/32586/diff/ Testing --- make check TODO: Test for upgrade path. Thanks, Kapil Arya
Re: Review Request 31028: Added slave run task hook tests.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31028/#review79152 --- src/examples/test_hook_module.cpp https://reviews.apache.org/r/31028/#comment128291 The foreach `label` shadows the `label` from line 79. And `label_` from line 86 doesn't need to be a different variable from the other `Label*` on line 79. Maybe oldLabel/newLabel are appropriate? src/tests/hook_tests.cpp https://reviews.apache.org/r/31028/#comment128292 No reason to explicitly CreateMasterFlags if you're not reusing `masterFlags` elsewhere or trying to force a new tmpDir. src/tests/hook_tests.cpp https://reviews.apache.org/r/31028/#comment128293 Unused. src/tests/hook_tests.cpp https://reviews.apache.org/r/31028/#comment128295 CopyFrom src/tests/hook_tests.cpp https://reviews.apache.org/r/31028/#comment128296 s/remove/removed/ src/tests/hook_tests.cpp https://reviews.apache.org/r/31028/#comment128297 One which will... and the other...? Unclear what the second label is for (testing add) - Adam B On March 13, 2015, 4:04 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31028/ --- (Updated March 13, 2015, 4:04 p.m.) Review request for mesos, Ben Mahler and Kapil Arya. Repository: mesos Description --- See summary Diffs - src/examples/test_hook_module.cpp 47409cd4d02e238d1d182571d92019114662cd41 src/tests/hook_tests.cpp bb9de25bd2c4601d333a3ca1aec13820c7df7378 Diff: https://reviews.apache.org/r/31028/diff/ Testing --- make check (with newly added VerifySlaveRunTaskHook test) Thanks, Niklas Nielsen
Re: Review Request 32834: Modifiy gdb scripts error message to check gdb is installed.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32834/#review78900 --- Ship it! Trivial but helpful error message clarification. I suggested an alternate wording, but am fine with this either way. bin/gdb-mesos-local.sh.in https://reviews.apache.org/r/32834/#comment128001 Alternate wording: Generated libtool doesn't appear to support gdb, or gdb is not installed. - Adam B On April 3, 2015, 3:01 p.m., Timothy Chen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32834/ --- (Updated April 3, 2015, 3:01 p.m.) Review request for mesos, Adam B, Cody Maloney, and Niklas Nielsen. Repository: mesos Description --- Ran into a problem where gdb isn't installed and I see a error message saying generated libtool doesn't support gdb. Changed the error message to ask user to make sure gdb is also installed. Diffs - bin/gdb-mesos-local.sh.in 72cfb68b4ff2ac796aa381cf6c49f6a4b83eb28b bin/gdb-mesos-master.sh.in f00af078bb9b8a6c3689d1ddd0db6efe38614d87 bin/gdb-mesos-slave.sh.in e01325c59ed62eb2e0d6bdf24808fc3f0cd206ab bin/gdb-mesos-tests.sh.in 626fefe7d953bf226e6d5fb84c87a6f3d66f4da9 Diff: https://reviews.apache.org/r/32834/diff/ Testing --- make Thanks, Timothy Chen
Re: Review Request 32832: Added CHANGELOG for 0.22.1
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32832/#review78851 --- Ship it! Ship It! - Adam B On April 3, 2015, 3:58 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32832/ --- (Updated April 3, 2015, 3:58 p.m.) Review request for mesos, Ben Mahler and Timothy Chen. Repository: mesos Description --- Added changelog section for Mesos 0.22.1 Diffs - CHANGELOG efcadfa0f896a50f21f34b84bdcaa61046d8cd4b Diff: https://reviews.apache.org/r/32832/diff/ Testing --- Thanks, Niklas Nielsen
Re: Review Request 29507: Added Configurable Slave Ping Timeouts
On Feb. 18, 2015, 11:38 a.m., Niklas Nielsen wrote: Let's get tests wired up before committing this :) Adam B wrote: Sure thing. Adding tests in my subsequent patch where we will pass the master's timeout values on to the slave. Will post that very soon. Ben Mahler wrote: Can you do it in one patch? This patch in isolation looks a bit dangerous per our conversation above. Also, please carefully consider whether your approach will be safe to do in a single version. i.e. What happens when there are old slaves running against a new master? And vice versa. Adam B wrote: Easy enough. Added the second patch to this one. Most of the new changes are in messages.proto, slave.hpp/cpp, and partition_tests.cpp, with a few lines in master.cpp to set the new message fields. Certainly safe to upgrade if the new flags stay at their default value. Also, with new slaves talking to an old master, the master will still use the defaults, hence so will the slaves. But old slaves running against a new master with a longer timeout will try to reregister unnecessarily early, so you may want to guarantee that all/most of the slaves have been upgraded before setting these flags, otherwise a large cluster suddenly waking up from a network split would see a lot of unnecessary reregistration attempts. The old behavior in this scenario was that the slaves would reregister after the same default timeout after the network split, and the master would consider them shutdown after the same timeout. If that last scenario is a problem, maybe the better approach is for each slave to specify its own ping timeout, and then the master can use these values with each slave's SlaveObserver. Then we can just require the master(s) to be upgraded first, as is typical. Adam B wrote: I'm inclined to think that we shouldn't worry too much about the unlikely scenario of a network split in the middle of a rolling upgrade to 0.23 where longer ping timeouts are simultaneously being applied. Procedure should be: upgrade (most of) the cluster, then restart the master(s) with the new longer ping timeout value. I can add a Note to upgrades.md if you like. How does that sound to you @bmahler ? If there are no objections, I'll rebase the patch and update the configuration and upgrades docs. Ben Mahler wrote: Yeah that sounds fine to me (might be prudent to call it out in the upgrade doc or the changelog). Mind reaching out to me directly when this is ready for more reviewing? :) Absolutely. Thanks. :) - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29507/#review72992 --- On Feb. 19, 2015, 12:10 a.m., Adam B wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29507/ --- (Updated Feb. 19, 2015, 12:10 a.m.) Review request for mesos, Ben Mahler and Niklas Nielsen. Bugs: MESOS-2110 https://issues.apache.org/jira/browse/MESOS-2110 Repository: mesos Description --- Added new --slave_ping_timeout and --max_slave_ping_timeouts flags to mesos-master to supplement the DEFAULT_SLAVE_PING_TIMEOUT (15secs) and DEFAULT_MAX_SLAVE_PING_TIMEOUTS (5). These can be extended if slaves are expected/allowed to be down for longer than a minute or two. Slave will receive master's ping timeout in SlaveRe[re]gisteredMessage. Beware that this affects recovery from network timeouts as well as actual slave node/process failover. Diffs - src/master/constants.hpp ad3fe81 src/master/constants.cpp d3d0f71 src/master/flags.hpp 51a6059 src/master/master.cpp f10a3cf src/messages/messages.proto 58484ae src/slave/constants.hpp 12d6e92 src/slave/constants.cpp 7868bef src/slave/slave.hpp 91dae10 src/slave/slave.cpp aec9525 src/tests/fault_tolerance_tests.cpp efa5c57 src/tests/partition_tests.cpp eb16a58 src/tests/slave_recovery_tests.cpp 8210c52 src/tests/slave_tests.cpp 153d9d6 Diff: https://reviews.apache.org/r/29507/diff/ Testing --- Manually tested slave failover/shutdown with master using different --slave_ping_timeout and --max_slave_ping_timeouts. Ran unit tests with shorter non-default values for ping timeouts. `make check` with new unit tests: ShortPingTimeoutUnreachableMaster and ShortPingTimeoutUnreachableSlave Thanks, Adam B
Re: Review Request 32583: Marked RunTaskMessage::framework_id as optional.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32583/#review78494 --- Should mention in the Description (commit message) that the new preferred location for the FrameworkID is or will be in FrameworkInfo.id. Would also like for you to replace ambiguous instances of 'it' in the description with the actual field/message to which 'it' refers. src/slave/slave.cpp https://reviews.apache.org/r/32583/#comment127312 CopyFrom? - Adam B On March 31, 2015, 1:28 p.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32583/ --- (Updated March 31, 2015, 1:28 p.m.) Review request for mesos, Adam B and Niklas Nielsen. Bugs: MESOS-2558 https://issues.apache.org/jira/browse/MESOS-2558 Repository: mesos Description --- For this release (N), we still keep setting it (handles older Slaves with newer Master). - Added code to handle it being unset in the Slave (handles older Master with newer Slaves). In the following release (N+1), stop reading/setting it (the previous version would handle the unset case). In release N+2, remove the field altogether (the previous release is not setting/reading it). Diffs - src/messages/messages.proto 97c45c01dfcea38b1ae555c036d61e10c152c2c8 src/slave/slave.cpp c7e65a6c095963feaa9d5fdbb519c68f8f761d16 Diff: https://reviews.apache.org/r/32583/diff/ Testing --- make check. TODO: Test for upgrade path. Thanks, Kapil Arya
Re: Review Request 32585: Replaced Framework.id with Framework.id() in Master/Slave.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32585/#review78496 --- src/slave/slave.cpp https://reviews.apache.org/r/32585/#comment127315 Hmm... I was actually thinking that this line of code can be removed now that you're always filling in the id in the FrameworkInfo by the time it gets to the slave. However, upon further inspection I see that there is a Slave::Framework::id field that we'll also need to deprecate, ensuring that Slave::Framework::info.id is set, and thus Archive::Framework::framework_info has the frameworkId. - Adam B On March 31, 2015, 1:29 p.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32585/ --- (Updated March 31, 2015, 1:29 p.m.) Review request for mesos, Adam B and Niklas Nielsen. Bugs: MESOS-2557 https://issues.apache.org/jira/browse/MESOS-2557 Repository: mesos Description --- Framework.id() extracts and returns FrameworkID from Framework.info. Diffs - src/master/http.cpp e1a87d646e9690e39a9e84ae383622018ce80401 src/master/master.hpp 3c957abcb54a0c23b8549c1d21d2d9277791938d src/master/master.cpp dccd7c635da4b7031cd109bd84e7f17b31777ef1 src/master/validation.cpp 2f2e4ea8ea123c5a0d01446cdec8b308ea60932e src/slave/http.cpp 5f0c39afd2fe9a89eb1df0052af8ab422306f30e src/slave/slave.hpp 19e6b44bc344c0ca509674803f401cbb4e1f47ae src/slave/slave.cpp c7e65a6c095963feaa9d5fdbb519c68f8f761d16 Diff: https://reviews.apache.org/r/32585/diff/ Testing --- make check TODO: Test for upgrade path. Thanks, Kapil Arya
Re: Review Request 32543: Documented problem and solution with slave recovery and systemd settings.
On March 27, 2015, 2:17 a.m., Adam B wrote: docs/upgrades.md, line 15 https://reviews.apache.org/r/32543/diff/2/?file=907124#file907124line15 Is there some new behavior in Mesos 0.22 that just caused this issue to start occurring? Or could this have happened with Mesos 0.21 or earlier with the same systemd default setting? If so, this is not an upgrade issue and this note doesn't belong in the upgrades doc. Joerg Schad wrote: It it not related to a release (that is why the issue is described in slave-recovery.md). The idea for including it here as well is to warn people explicitly of this issue when upgrading (where this behavior will likely occur). I can drop it, but personally feel it is quite helpful here. If it's not an issue with upgrading Mesos, then it doesn't belong in the upgrades doc, especially in a section for upgrading to a specific version of Mesos. Please remove. The slave recovery doc seems like the right place for this note. On March 27, 2015, 2:17 a.m., Adam B wrote: docs/slave-recovery.md, line 71 https://reviews.apache.org/r/32543/diff/2/?file=907123#file907123line71 (If the slave does not come back, each executorDriver shuts itself down after $MESOS_RECOVERY_TIMEOUT.) Important question: If an executor is killed, does this systemd mode affect whether its tasks would get killed? Alexander Rukletsov wrote: Adam, could you please explain what use case do you have in mind and how it is related to slave recovery? Adam B wrote: It's not related to slave recovery necessarily, but to how this KillMode impacts other processes like a custom executor. Some frameworks (like HDFS) have a custom executor that launches task(s) as a separate process/subprocess. If the executor is killed (kill -9, or shutdown by the framework/admin), will this change in KillMode affect whether the executors task subprocesses also get killed? I'm mostly worried about this KillMode change suddenly leaving stranded task processes if/when executors are killed. Alexander Rukletsov wrote: I thought that's exactly why we have containerizers: clean-up all stranded processes. Fair enough, when the slave is running. But what if the executor is killed while the slave (thus also the containerizer) is shutdown/recovering? I'm not claiming there's anything necessarily wrong with using this KillMode. I just ask the question to make sure we don't recommend a setting that may fix one issue but cause others. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32543/#review78025 --- On March 27, 2015, 7:09 a.m., Joerg Schad wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32543/ --- (Updated March 27, 2015, 7:09 a.m.) Review request for mesos, Alexander Rukletsov and Brenden Matthews. Bugs: Mesos-2555 https://issues.apache.org/jira/browse/Mesos-2555 Repository: mesos Description --- Documented the problem and solution encountered in MESOS-2419. Diffs - docs/slave-recovery.md 4bb4a71c6945bd70121743a1e9209a26906773c1 docs/upgrades.md 2a15694607c079ad95ef6cf7f1490872ab9a5976 Diff: https://reviews.apache.org/r/32543/diff/ Testing --- markdown check Thanks, Joerg Schad
Re: Review Request 32543: Documented problem and solution with slave recovery and systemd settings.
On March 27, 2015, 2:17 a.m., Adam B wrote: docs/slave-recovery.md, line 71 https://reviews.apache.org/r/32543/diff/2/?file=907123#file907123line71 (If the slave does not come back, each executorDriver shuts itself down after $MESOS_RECOVERY_TIMEOUT.) Important question: If an executor is killed, does this systemd mode affect whether its tasks would get killed? Alexander Rukletsov wrote: Adam, could you please explain what use case do you have in mind and how it is related to slave recovery? It's not related to slave recovery necessarily, but to how this KillMode impacts other processes like a custom executor. Some frameworks (like HDFS) have a custom executor that launches task(s) as a separate process/subprocess. If the executor is killed (kill -9, or shutdown by the framework/admin), will this change in KillMode affect whether the executors task subprocesses also get killed? I'm mostly worried about this KillMode change suddenly leaving stranded task processes if/when executors are killed. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32543/#review78025 --- On March 27, 2015, 7:09 a.m., Joerg Schad wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32543/ --- (Updated March 27, 2015, 7:09 a.m.) Review request for mesos, Alexander Rukletsov and Brenden Matthews. Bugs: Mesos-2555 https://issues.apache.org/jira/browse/Mesos-2555 Repository: mesos Description --- Documented the problem and solution encountered in MESOS-2419. Diffs - docs/slave-recovery.md 4bb4a71c6945bd70121743a1e9209a26906773c1 docs/upgrades.md 2a15694607c079ad95ef6cf7f1490872ab9a5976 Diff: https://reviews.apache.org/r/32543/diff/ Testing --- markdown check Thanks, Joerg Schad
Re: Review Request 32585: Replaced Framework.id with Framework.id() in Master/Slave.
On April 1, 2015, 2:28 a.m., Adam B wrote: src/slave/slave.cpp, line 1043 https://reviews.apache.org/r/32585/diff/1-2/?file=908253#file908253line1043 Hmm... I was actually thinking that this line of code can be removed now that you're always filling in the id in the FrameworkInfo by the time it gets to the slave. However, upon further inspection I see that there is a Slave::Framework::id field that we'll also need to deprecate, ensuring that Slave::Framework::info.id is set, and thus Archive::Framework::framework_info has the frameworkId. Kapil Arya wrote: I am not sure if I understand this. slave::Framework::id has been removed in this patch. I couldn't find `Slave::Framework` in the codebase. Can you point me to that? Right you are. Sorry, it was late and I confused `pid` with `id` and thought you only removed the comment, based on ReviewBoard's improper rendering of https://reviews.apache.org/r/32585/diff/1-2/ Both lines removed. Looks good. Dropping. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32585/#review78496 --- On April 1, 2015, 12:34 p.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32585/ --- (Updated April 1, 2015, 12:34 p.m.) Review request for mesos, Adam B and Niklas Nielsen. Bugs: MESOS-2557 https://issues.apache.org/jira/browse/MESOS-2557 Repository: mesos Description --- Framework.id() extracts and returns FrameworkID from Framework.info. Diffs - src/master/http.cpp e1a87d646e9690e39a9e84ae383622018ce80401 src/master/master.hpp 3c957abcb54a0c23b8549c1d21d2d9277791938d src/master/master.cpp dccd7c635da4b7031cd109bd84e7f17b31777ef1 src/master/validation.cpp 2f2e4ea8ea123c5a0d01446cdec8b308ea60932e src/slave/http.cpp 5f0c39afd2fe9a89eb1df0052af8ab422306f30e src/slave/slave.hpp 19e6b44bc344c0ca509674803f401cbb4e1f47ae src/slave/slave.cpp c7e65a6c095963feaa9d5fdbb519c68f8f761d16 Diff: https://reviews.apache.org/r/32585/diff/ Testing --- make check TODO: Test for upgrade path. Thanks, Kapil Arya
Re: Review Request 31228: Added a mechanism for disabling http endpoints.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31228/#review78577 --- 3rdparty/libprocess/include/process/process.hpp https://reviews.apache.org/r/31228/#comment127458 Since this function performs an action, it should be named with a verb, e.g. initializeDisabledEndpoints() or setDisabledEndpoints() or updateDisabledEndpoints(). 3rdparty/libprocess/src/process.cpp https://reviews.apache.org/r/31228/#comment127454 What makes this the right place in initialize() to call this? Why not earlier or later? And initialize(delegate) is a non-static (library-wide?) method calling your new static method. Are there any problems if your method gets called multiple times? Could this even happen? 3rdparty/libprocess/src/process.cpp https://reviews.apache.org/r/31228/#comment127457 Since disabledHttpEndpoints is a hashset, couldn't you just use contains()? - Adam B On April 1, 2015, 2:50 a.m., Alexander Rojas wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31228/ --- (Updated April 1, 2015, 2:50 a.m.) Review request for mesos, Joerg Schad, Niklas Nielsen, and Till Toenshoff. Bugs: MESOS-2333 https://issues.apache.org/jira/browse/MESOS-2333 Repository: mesos Description --- Adds a mechanism for disabling http endpoints (e.g `testprocess/handler1,processname(3)/handler2`). A list of coma separated strings can be provided using the environment variable `LIBPROCESS_DISABLED_ENDPOINTS` which will be read during libprocess initialization. Then, when creating http endpoints (using the method `route`) the endpoint path will be checked against the patterns. If a match is found the endpoint handler will be replaced for a generic once which returns a 403 HTTP Error (Forbidden). Diffs - 3rdparty/libprocess/include/process/process.hpp 392c74df3e8a122aecd3633dffdeec4bcbd1f097 3rdparty/libprocess/src/process.cpp cf4e36489be2c6aa01e838c1c71383f248deab5b 3rdparty/libprocess/src/tests/process_tests.cpp eb38edce2c483ba7f963a826893a15a075238618 Diff: https://reviews.apache.org/r/31228/diff/ Testing --- make check Thanks, Alexander Rojas
Re: Review Request 31016: Added slave run task decorator.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31016/#review78193 --- Looks good. Just some minor cleanup suggestions. include/mesos/hook.hpp https://reviews.apache.org/r/31016/#comment126704 s/overwrites/overwrite/ src/hook/manager.cpp https://reviews.apache.org/r/31016/#comment126707 I'd prefer to avoid the `Labels*` and just substitute `taskInfo_.mutable_labels()` or `taskInfo_.labels()` at the two uses. src/hook/manager.cpp https://reviews.apache.org/r/31016/#comment126708 What would result.isNone() mean? Please add a comment, if not a LOG(INFO). src/slave/slave.cpp https://reviews.apache.org/r/31016/#comment126706 +1 to introducing this as high up in the method as possible, to reduce risk of using the wrong taskInfo in future nearby calls. And another +1 to renaming this `task` to prevent accidental use of the const parameter (rename it `task_`) instead of the actively modified taskInfo. - Adam B On March 13, 2015, 4:04 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31016/ --- (Updated March 13, 2015, 4:04 p.m.) Review request for mesos, Ben Mahler and Kapil Arya. Repository: mesos Description --- Added decorator which gets invoked on start of runTask() sequence in the slave. Diffs - include/mesos/hook.hpp 9ae8b9455a86c7a5cbf4f1d1b1ce88f2811ce35d src/hook/manager.hpp da813492108974a7e26b366845368517589da876 src/hook/manager.cpp 7a4cb09bc221af502e867cfb7fff2900b599ff1f src/slave/slave.hpp 989832f8783d07d0702b30f0a68b8c383b57c621 src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 Diff: https://reviews.apache.org/r/31016/diff/ Testing --- make check Thanks, Niklas Nielsen
Re: Review Request 30961: Enabled label decorator to override.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30961/#review78189 --- Minor cleanup/suggestions, but otherwise good. We'll definitely need to document te module(-manager) API change of overriding the label set instead of merging. This should probably go in the upgrades doc? src/examples/test_hook_module.cpp https://reviews.apache.org/r/30961/#comment126700 Did you mix up key and Value? Would think you'd want the MESOS_Test_Remove_Label to be the key. src/hook/manager.cpp https://reviews.apache.org/r/30961/#comment126702 Why not just return `taskInfo_.labels()`? In fact, I don't think there's really any reason to have a `labels` pointer in this function. Just do `taskInfo_.mutable_labels()-CopyFrom(result.get());` and then it's clear throughout that you're modifying the same `taskInfo_` object. src/tests/hook_tests.cpp https://reviews.apache.org/r/30961/#comment126701 Switch Key, Value - Adam B On March 13, 2015, 4:04 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30961/ --- (Updated March 13, 2015, 4:04 p.m.) Review request for mesos, Ben Mahler and Kapil Arya. Bugs: MESOS-2351 https://issues.apache.org/jira/browse/MESOS-2351 Repository: mesos Description --- See summary. Diffs - include/mesos/hook.hpp 9ae8b9455a86c7a5cbf4f1d1b1ce88f2811ce35d src/examples/test_hook_module.cpp 47409cd4d02e238d1d182571d92019114662cd41 src/hook/manager.cpp 7a4cb09bc221af502e867cfb7fff2900b599ff1f src/master/master.cpp dccd7c635da4b7031cd109bd84e7f17b31777ef1 src/tests/hook_tests.cpp bb9de25bd2c4601d333a3ca1aec13820c7df7378 Diff: https://reviews.apache.org/r/30961/diff/ Testing --- make check (with modified VerifyMasterLaunchTaskHook test) Thanks, Niklas Nielsen
Re: Review Request 30962: Enabled environment decorator to override.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30962/#review78191 --- src/hook/manager.cpp https://reviews.apache.org/r/30962/#comment126703 And if (result.isNone()), is that really supposed to mean that this hook didn't want to modify the env, so the HookManager can leave the environment as is and move onto the next hook? If so, it's probably worth a comment, if not a LOG(INFO). - Adam B On March 13, 2015, 4:04 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30962/ --- (Updated March 13, 2015, 4:04 p.m.) Review request for mesos, Ben Mahler and Kapil Arya. Bugs: MESOS-2351 https://issues.apache.org/jira/browse/MESOS-2351 Repository: mesos Description --- See summary Diffs - src/examples/test_hook_module.cpp 47409cd4d02e238d1d182571d92019114662cd41 src/hook/manager.cpp 7a4cb09bc221af502e867cfb7fff2900b599ff1f Diff: https://reviews.apache.org/r/30962/diff/ Testing --- make check Thanks, Niklas Nielsen
Re: Review Request 29507: Added Configurable Slave Ping Timeouts
On Feb. 18, 2015, 11:38 a.m., Niklas Nielsen wrote: Let's get tests wired up before committing this :) Adam B wrote: Sure thing. Adding tests in my subsequent patch where we will pass the master's timeout values on to the slave. Will post that very soon. Ben Mahler wrote: Can you do it in one patch? This patch in isolation looks a bit dangerous per our conversation above. Also, please carefully consider whether your approach will be safe to do in a single version. i.e. What happens when there are old slaves running against a new master? And vice versa. Adam B wrote: Easy enough. Added the second patch to this one. Most of the new changes are in messages.proto, slave.hpp/cpp, and partition_tests.cpp, with a few lines in master.cpp to set the new message fields. Certainly safe to upgrade if the new flags stay at their default value. Also, with new slaves talking to an old master, the master will still use the defaults, hence so will the slaves. But old slaves running against a new master with a longer timeout will try to reregister unnecessarily early, so you may want to guarantee that all/most of the slaves have been upgraded before setting these flags, otherwise a large cluster suddenly waking up from a network split would see a lot of unnecessary reregistration attempts. The old behavior in this scenario was that the slaves would reregister after the same default timeout after the network split, and the master would consider them shutdown after the same timeout. If that last scenario is a problem, maybe the better approach is for each slave to specify its own ping timeout, and then the master can use these values with each slave's SlaveObserver. Then we can just require the master(s) to be upgraded first, as is typical. I'm inclined to think that we shouldn't worry too much about the unlikely scenario of a network split in the middle of a rolling upgrade to 0.23 where longer ping timeouts are simultaneously being applied. Procedure should be: upgrade (most of) the cluster, then restart the master(s) with the new longer ping timeout value. I can add a Note to upgrades.md if you like. How does that sound to you @bmahler ? If there are no objections, I'll rebase the patch and update the configuration and upgrades docs. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29507/#review72992 --- On Feb. 19, 2015, 12:10 a.m., Adam B wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29507/ --- (Updated Feb. 19, 2015, 12:10 a.m.) Review request for mesos, Ben Mahler and Niklas Nielsen. Bugs: MESOS-2110 https://issues.apache.org/jira/browse/MESOS-2110 Repository: mesos Description --- Added new --slave_ping_timeout and --max_slave_ping_timeouts flags to mesos-master to supplement the DEFAULT_SLAVE_PING_TIMEOUT (15secs) and DEFAULT_MAX_SLAVE_PING_TIMEOUTS (5). These can be extended if slaves are expected/allowed to be down for longer than a minute or two. Slave will receive master's ping timeout in SlaveRe[re]gisteredMessage. Beware that this affects recovery from network timeouts as well as actual slave node/process failover. Diffs - src/master/constants.hpp ad3fe81 src/master/constants.cpp d3d0f71 src/master/flags.hpp 51a6059 src/master/master.cpp f10a3cf src/messages/messages.proto 58484ae src/slave/constants.hpp 12d6e92 src/slave/constants.cpp 7868bef src/slave/slave.hpp 91dae10 src/slave/slave.cpp aec9525 src/tests/fault_tolerance_tests.cpp efa5c57 src/tests/partition_tests.cpp eb16a58 src/tests/slave_recovery_tests.cpp 8210c52 src/tests/slave_tests.cpp 153d9d6 Diff: https://reviews.apache.org/r/29507/diff/ Testing --- Manually tested slave failover/shutdown with master using different --slave_ping_timeout and --max_slave_ping_timeouts. Ran unit tests with shorter non-default values for ping timeouts. `make check` with new unit tests: ShortPingTimeoutUnreachableMaster and ShortPingTimeoutUnreachableSlave Thanks, Adam B
Re: Review Request 29507: Added Configurable Slave Ping Timeouts
On March 6, 2015, 5:19 a.m., Joerg Schad wrote: src/master/flags.hpp, line 383 https://reviews.apache.org/r/29507/diff/4/?file=868836#file868836line383 Shouldn't this also be added to the documentation (i.e. http://mesos.apache.org/documentation/latest/configuration/)? Excellent point. Will update that in the next patch revision. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29507/#review75484 --- On Feb. 19, 2015, 12:10 a.m., Adam B wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29507/ --- (Updated Feb. 19, 2015, 12:10 a.m.) Review request for mesos, Ben Mahler and Niklas Nielsen. Bugs: MESOS-2110 https://issues.apache.org/jira/browse/MESOS-2110 Repository: mesos Description --- Added new --slave_ping_timeout and --max_slave_ping_timeouts flags to mesos-master to supplement the DEFAULT_SLAVE_PING_TIMEOUT (15secs) and DEFAULT_MAX_SLAVE_PING_TIMEOUTS (5). These can be extended if slaves are expected/allowed to be down for longer than a minute or two. Slave will receive master's ping timeout in SlaveRe[re]gisteredMessage. Beware that this affects recovery from network timeouts as well as actual slave node/process failover. Diffs - src/master/constants.hpp ad3fe81 src/master/constants.cpp d3d0f71 src/master/flags.hpp 51a6059 src/master/master.cpp f10a3cf src/messages/messages.proto 58484ae src/slave/constants.hpp 12d6e92 src/slave/constants.cpp 7868bef src/slave/slave.hpp 91dae10 src/slave/slave.cpp aec9525 src/tests/fault_tolerance_tests.cpp efa5c57 src/tests/partition_tests.cpp eb16a58 src/tests/slave_recovery_tests.cpp 8210c52 src/tests/slave_tests.cpp 153d9d6 Diff: https://reviews.apache.org/r/29507/diff/ Testing --- Manually tested slave failover/shutdown with master using different --slave_ping_timeout and --max_slave_ping_timeouts. Ran unit tests with shorter non-default values for ping timeouts. `make check` with new unit tests: ShortPingTimeoutUnreachableMaster and ShortPingTimeoutUnreachableSlave Thanks, Adam B
Re: Review Request 32543: Document problem and solution encountered in Mesos-2419.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32543/#review78025 --- Great. It reads well and explains the issue. Just a couple of questions for you. docs/slave-recovery.md https://reviews.apache.org/r/32543/#comment126411 How about: Troubleshooting: If tasks are not recovering on slave restart when using systemd Let's not word it as a bug, but as a configuration issue that can be easily resolved. docs/slave-recovery.md https://reviews.apache.org/r/32543/#comment126412 Minor: Maybe only link on [KillMode] rather than [default KillMode] docs/slave-recovery.md https://reviews.apache.org/r/32543/#comment126413 (If the slave does not come back, each executorDriver shuts itself down after $MESOS_RECOVERY_TIMEOUT.) Important question: If an executor is killed, does this systemd mode affect whether its tasks would get killed? docs/upgrades.md https://reviews.apache.org/r/32543/#comment126410 Is there some new behavior in Mesos 0.22 that just caused this issue to start occurring? Or could this have happened with Mesos 0.21 or earlier with the same systemd default setting? If so, this is not an upgrade issue and this note doesn't belong in the upgrades doc. - Adam B On March 26, 2015, 4:53 p.m., Joerg Schad wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32543/ --- (Updated March 26, 2015, 4:53 p.m.) Review request for mesos, Alexander Rukletsov and Brenden Matthews. Bugs: Mesos-2555 https://issues.apache.org/jira/browse/Mesos-2555 Repository: mesos Description --- Document problem and solution encountered in Mesos-2419. Diffs - docs/slave-recovery.md 4bb4a71c6945bd70121743a1e9209a26906773c1 docs/upgrades.md 2a15694607c079ad95ef6cf7f1490872ab9a5976 Diff: https://reviews.apache.org/r/32543/diff/ Testing --- markdown check Thanks, Joerg Schad
Re: Review Request 27760: Revised authenticator interface to allow for two fold implementations.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27760/#review78123 --- Fix looks good to me. Buildbot, please take another look. ;) - Adam B On March 25, 2015, 4:35 p.m., Till Toenshoff wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27760/ --- (Updated March 25, 2015, 4:35 p.m.) Review request for mesos, Adam B, Kapil Arya, Niklas Nielsen, and Vinod Kone. Bugs: MESOS-2050 https://issues.apache.org/jira/browse/MESOS-2050 Repository: mesos Description --- The initial design and implementation of the authenticator module interface caused issues and was not optimal for heavy lifting setup of external dependencies. By introducing a two fold design, this has been decoupled from the authentication message processing. The new design also gets us back on track to the goal of makeing SASL a soft dependency of mesos. Diffs - include/mesos/authentication/authenticator.hpp f66217a src/authentication/cram_md5/authenticator.hpp c6f465f src/authentication/cram_md5/authenticator.cpp PRE-CREATION src/authentication/cram_md5/auxprop.hpp b894386 src/master/master.hpp 3c957ab src/master/master.cpp dccd7c6 src/tests/cram_md5_authentication_tests.cpp 92a89c5 Diff: https://reviews.apache.org/r/27760/diff/ Testing --- make check Thanks, Till Toenshoff
Re: Review Request 31539: Remove the checkpoint variable entirely from slave/flags.hpp.
On March 24, 2015, 11:15 a.m., Jie Yu wrote: FYI, there's a regression in this patch and I committed a fix: commit 46f80f270bc335172e919343f6f14e007216823f Author: Jie Yu yujie@gmail.com Date: Tue Mar 24 11:06:08 2015 -0700 Fixed a regression caused by the slave checkpoint flag removal. diff --git a/src/tests/port_mapping_tests.cpp b/src/tests/port_mapping_tests.cpp index 8192dea..623840e 100644 --- a/src/tests/port_mapping_tests.cpp +++ b/src/tests/port_mapping_tests.cpp @@ -1669,7 +1669,6 @@ public: ContainerizerTestMesosContainerizer::CreateSlaveFlags(); // Setup recovery slave flags. -flags.checkpoint = true; flags.recover = reconnect; flags.strict = true; Thanks for finding this Jie. I believe flags.recover and flags.strict are the defaults, so we can simplify this function to just `return ContainerizerTestMesosContainerizer::CreateSlaveFlags();` Joerg, would you like to handle this cleanup? - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31539/#review77608 --- On March 19, 2015, 8:52 a.m., Joerg Schad wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31539/ --- (Updated March 19, 2015, 8:52 a.m.) Review request for mesos, Adam B, Cody Maloney, and Till Toenshoff. Bugs: MESOS-2375 https://issues.apache.org/jira/browse/MESOS-2375 Repository: mesos Description --- As a number of tests rely on the checkpointing flag to be false, a few tests had to be adapted. Removed the following test as the tested logic is specific to (old) non-checkpointing slaves: SlaveRecoveryTest.NonCheckpointingSlave: This test checks whether a non-checkpointing slave is not scheduled to a checkpointing framework. It can be removed as all slaves are checkpointing slaves. Diffs - include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa src/slave/slave.cpp f1f210045e6100560f0d26244f9675f4543a5620 src/tests/disk_quota_tests.cpp 9c3a8815c3478535b72888c296a4aa5cda341ba3 src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 src/tests/fault_tolerance_tests.cpp 9ac75b1f601e14a3d3d117775f37a4a48b291dc6 src/tests/gc_tests.cpp deaa6b1b6c32ae6d153229248d7d4f57caa0ebcf src/tests/master_allocator_tests.cpp a432d0207e1a92532a495bf9ad2826414ee4f6f0 src/tests/master_authorization_tests.cpp ff706ed6f8537207b30a548b0ce2121c5df71ab9 src/tests/master_tests.cpp e69348be676a80017062e3abbd15b8008a6009d7 src/tests/master_validation_tests.cpp c8742928a4e93e86ccd0f5a39856a65cfe8eb74f src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 src/tests/partition_tests.cpp bb96aed37861867fbde68445016f0c6e039f3fb4 src/tests/persistent_volume_tests.cpp b617117ade4b487cc06002cfeca76a0486833b20 src/tests/reconciliation_tests.cpp acd70021574b05ab23872add5bdfa4a46b7dfc51 src/tests/slave_recovery_tests.cpp 53adae0118a26e6d25a9ff20c6374cc8e73275b1 src/tests/status_update_manager_tests.cpp 216a22e9f292b4141c8b966dad0f25dbd791c025 Diff: https://reviews.apache.org/r/31539/diff/ Testing --- make check GTEST_BREAK_ON_FAILURE=1 GTEST_SHUFFLE=1 GTEST_REPEAT=50 on OSX (had to exclude some known flaky tests under OSX) Thanks, Joerg Schad
Re: Review Request 31539: Remove the checkpoint variable entirely from slave/flags.hpp.
On March 18, 2015, 11:04 p.m., Adam B wrote: (Sorry, pulled the trigger early; meant to click Save instead. More coming..) - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31539/#review77020 --- On March 18, 2015, 2:43 p.m., Joerg Schad wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31539/ --- (Updated March 18, 2015, 2:43 p.m.) Review request for mesos, Adam B, Cody Maloney, and Till Toenshoff. Bugs: MESOS-2375 https://issues.apache.org/jira/browse/MESOS-2375 Repository: mesos Description --- As a number of tests rely on the checkpointing flag to be false, a few tests had to be adapted. Removed the following test as the tested logic is specific to (old) non-checkpointing slaves: SlaveRecoveryTest.NonCheckpointingSlave: This test checks whether a non-checkpointing slave is not scheduled to a checkpointing framework. It can be removed as all slaves are checkpointing slaves. Diffs - include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa src/slave/slave.cpp f1f210045e6100560f0d26244f9675f4543a5620 src/tests/disk_quota_tests.cpp 9c3a8815c3478535b72888c296a4aa5cda341ba3 src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 src/tests/fault_tolerance_tests.cpp 9ac75b1f601e14a3d3d117775f37a4a48b291dc6 src/tests/gc_tests.cpp deaa6b1b6c32ae6d153229248d7d4f57caa0ebcf src/tests/master_allocator_tests.cpp a432d0207e1a92532a495bf9ad2826414ee4f6f0 src/tests/master_authorization_tests.cpp ff706ed6f8537207b30a548b0ce2121c5df71ab9 src/tests/master_tests.cpp e69348be676a80017062e3abbd15b8008a6009d7 src/tests/master_validation_tests.cpp c8742928a4e93e86ccd0f5a39856a65cfe8eb74f src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 src/tests/partition_tests.cpp bb96aed37861867fbde68445016f0c6e039f3fb4 src/tests/persistent_volume_tests.cpp b617117ade4b487cc06002cfeca76a0486833b20 src/tests/reconciliation_tests.cpp acd70021574b05ab23872add5bdfa4a46b7dfc51 src/tests/slave_recovery_tests.cpp 53adae0118a26e6d25a9ff20c6374cc8e73275b1 src/tests/status_update_manager_tests.cpp 216a22e9f292b4141c8b966dad0f25dbd791c025 Diff: https://reviews.apache.org/r/31539/diff/ Testing --- make check GTEST_BREAK_ON_FAILURE=1 GTEST_SHUFFLE=1 GTEST_REPEAT=50 on OSX (had to exclude some known flaky tests under OSX) Thanks, Joerg Schad
Re: Review Request 31539: Remove the checkpoint variable entirely from slave/flags.hpp.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31539/#review77020 --- src/tests/master_authorization_tests.cpp https://reviews.apache.org/r/31539/#comment124787 Nit: s/ // or is that coming in a separate review? - Adam B On March 18, 2015, 2:43 p.m., Joerg Schad wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31539/ --- (Updated March 18, 2015, 2:43 p.m.) Review request for mesos, Adam B, Cody Maloney, and Till Toenshoff. Bugs: MESOS-2375 https://issues.apache.org/jira/browse/MESOS-2375 Repository: mesos Description --- As a number of tests rely on the checkpointing flag to be false, a few tests had to be adapted. Removed the following test as the tested logic is specific to (old) non-checkpointing slaves: SlaveRecoveryTest.NonCheckpointingSlave: This test checks whether a non-checkpointing slave is not scheduled to a checkpointing framework. It can be removed as all slaves are checkpointing slaves. Diffs - include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa src/slave/slave.cpp f1f210045e6100560f0d26244f9675f4543a5620 src/tests/disk_quota_tests.cpp 9c3a8815c3478535b72888c296a4aa5cda341ba3 src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 src/tests/fault_tolerance_tests.cpp 9ac75b1f601e14a3d3d117775f37a4a48b291dc6 src/tests/gc_tests.cpp deaa6b1b6c32ae6d153229248d7d4f57caa0ebcf src/tests/master_allocator_tests.cpp a432d0207e1a92532a495bf9ad2826414ee4f6f0 src/tests/master_authorization_tests.cpp ff706ed6f8537207b30a548b0ce2121c5df71ab9 src/tests/master_tests.cpp e69348be676a80017062e3abbd15b8008a6009d7 src/tests/master_validation_tests.cpp c8742928a4e93e86ccd0f5a39856a65cfe8eb74f src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 src/tests/partition_tests.cpp bb96aed37861867fbde68445016f0c6e039f3fb4 src/tests/persistent_volume_tests.cpp b617117ade4b487cc06002cfeca76a0486833b20 src/tests/reconciliation_tests.cpp acd70021574b05ab23872add5bdfa4a46b7dfc51 src/tests/slave_recovery_tests.cpp 53adae0118a26e6d25a9ff20c6374cc8e73275b1 src/tests/status_update_manager_tests.cpp 216a22e9f292b4141c8b966dad0f25dbd791c025 Diff: https://reviews.apache.org/r/31539/diff/ Testing --- make check GTEST_BREAK_ON_FAILURE=1 GTEST_SHUFFLE=1 GTEST_REPEAT=50 on OSX (had to exclude some known flaky tests under OSX) Thanks, Joerg Schad
Re: Review Request 31539: Remove the checkpoint variable entirely from slave/flags.hpp.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31539/#review77022 --- Ship it! LGTM! Just some comments about comments, and then I'd be happy to commit it. src/tests/status_update_manager_tests.cpp https://reviews.apache.org/r/31539/#comment124789 s/when checking the checkpoint/when recovering the checkpointed data/ src/tests/fault_tolerance_tests.cpp https://reviews.apache.org/r/31539/#comment124792 Something sounds weird with ...as otherwise with checkpointing... How about Stop the slave with an explicit shutdown message so that the master sees it exit cleanly. src/tests/master_allocator_tests.cpp https://reviews.apache.org/r/31539/#comment124796 Should this line also get a comment explaining why shutdown=true? src/tests/master_authorization_tests.cpp https://reviews.apache.org/r/31539/#comment124797 .. with an explicit shutdown message... src/tests/master_tests.cpp https://reviews.apache.org/r/31539/#comment124804 How about // Reuse slaveFlags so next StartSlave() uses the same work_dir.? src/tests/master_tests.cpp https://reviews.apache.org/r/31539/#comment124799 Style: Comments wrap at 70 chars. src/tests/master_tests.cpp https://reviews.apache.org/r/31539/#comment124801 Is this needed for its work_dir? - Adam B On March 18, 2015, 2:43 p.m., Joerg Schad wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31539/ --- (Updated March 18, 2015, 2:43 p.m.) Review request for mesos, Adam B, Cody Maloney, and Till Toenshoff. Bugs: MESOS-2375 https://issues.apache.org/jira/browse/MESOS-2375 Repository: mesos Description --- As a number of tests rely on the checkpointing flag to be false, a few tests had to be adapted. Removed the following test as the tested logic is specific to (old) non-checkpointing slaves: SlaveRecoveryTest.NonCheckpointingSlave: This test checks whether a non-checkpointing slave is not scheduled to a checkpointing framework. It can be removed as all slaves are checkpointing slaves. Diffs - include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa src/slave/slave.cpp f1f210045e6100560f0d26244f9675f4543a5620 src/tests/disk_quota_tests.cpp 9c3a8815c3478535b72888c296a4aa5cda341ba3 src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 src/tests/fault_tolerance_tests.cpp 9ac75b1f601e14a3d3d117775f37a4a48b291dc6 src/tests/gc_tests.cpp deaa6b1b6c32ae6d153229248d7d4f57caa0ebcf src/tests/master_allocator_tests.cpp a432d0207e1a92532a495bf9ad2826414ee4f6f0 src/tests/master_authorization_tests.cpp ff706ed6f8537207b30a548b0ce2121c5df71ab9 src/tests/master_tests.cpp e69348be676a80017062e3abbd15b8008a6009d7 src/tests/master_validation_tests.cpp c8742928a4e93e86ccd0f5a39856a65cfe8eb74f src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 src/tests/partition_tests.cpp bb96aed37861867fbde68445016f0c6e039f3fb4 src/tests/persistent_volume_tests.cpp b617117ade4b487cc06002cfeca76a0486833b20 src/tests/reconciliation_tests.cpp acd70021574b05ab23872add5bdfa4a46b7dfc51 src/tests/slave_recovery_tests.cpp 53adae0118a26e6d25a9ff20c6374cc8e73275b1 src/tests/status_update_manager_tests.cpp 216a22e9f292b4141c8b966dad0f25dbd791c025 Diff: https://reviews.apache.org/r/31539/diff/ Testing --- make check GTEST_BREAK_ON_FAILURE=1 GTEST_SHUFFLE=1 GTEST_REPEAT=50 on OSX (had to exclude some known flaky tests under OSX) Thanks, Joerg Schad
Re: Review Request 31539: Remove the checkpoint variable entirely from slave/flags.hpp.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31539/#review77055 --- Ship it! Looks great! I'll try to get this committed later today, if nobody else has any objections. src/tests/fault_tolerance_tests.cpp https://reviews.apache.org/r/31539/#comment124851 No need to call it a checkpointing slave, since all slaves are checkpointing now. - Adam B On March 19, 2015, 8:52 a.m., Joerg Schad wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31539/ --- (Updated March 19, 2015, 8:52 a.m.) Review request for mesos, Adam B, Cody Maloney, and Till Toenshoff. Bugs: MESOS-2375 https://issues.apache.org/jira/browse/MESOS-2375 Repository: mesos Description --- As a number of tests rely on the checkpointing flag to be false, a few tests had to be adapted. Removed the following test as the tested logic is specific to (old) non-checkpointing slaves: SlaveRecoveryTest.NonCheckpointingSlave: This test checks whether a non-checkpointing slave is not scheduled to a checkpointing framework. It can be removed as all slaves are checkpointing slaves. Diffs - include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa src/slave/slave.cpp f1f210045e6100560f0d26244f9675f4543a5620 src/tests/disk_quota_tests.cpp 9c3a8815c3478535b72888c296a4aa5cda341ba3 src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 src/tests/fault_tolerance_tests.cpp 9ac75b1f601e14a3d3d117775f37a4a48b291dc6 src/tests/gc_tests.cpp deaa6b1b6c32ae6d153229248d7d4f57caa0ebcf src/tests/master_allocator_tests.cpp a432d0207e1a92532a495bf9ad2826414ee4f6f0 src/tests/master_authorization_tests.cpp ff706ed6f8537207b30a548b0ce2121c5df71ab9 src/tests/master_tests.cpp e69348be676a80017062e3abbd15b8008a6009d7 src/tests/master_validation_tests.cpp c8742928a4e93e86ccd0f5a39856a65cfe8eb74f src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 src/tests/partition_tests.cpp bb96aed37861867fbde68445016f0c6e039f3fb4 src/tests/persistent_volume_tests.cpp b617117ade4b487cc06002cfeca76a0486833b20 src/tests/reconciliation_tests.cpp acd70021574b05ab23872add5bdfa4a46b7dfc51 src/tests/slave_recovery_tests.cpp 53adae0118a26e6d25a9ff20c6374cc8e73275b1 src/tests/status_update_manager_tests.cpp 216a22e9f292b4141c8b966dad0f25dbd791c025 Diff: https://reviews.apache.org/r/31539/diff/ Testing --- make check GTEST_BREAK_ON_FAILURE=1 GTEST_SHUFFLE=1 GTEST_REPEAT=50 on OSX (had to exclude some known flaky tests under OSX) Thanks, Joerg Schad
Re: Review Request 31539: Remove the checkpoint variable entirely from slave/flags.hpp.
On March 19, 2015, 9:33 a.m., Adam B wrote: src/tests/fault_tolerance_tests.cpp, line 123 https://reviews.apache.org/r/31539/diff/7-8/?file=899402#file899402line123 No need to call it a checkpointing slave, since all slaves are checkpointing now. Joerg Schad wrote: Till was initially wondering why the behavior changed here, so I wanted to emphasize... but I can remove it... No big deal. Your call. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31539/#review77055 --- On March 19, 2015, 8:52 a.m., Joerg Schad wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31539/ --- (Updated March 19, 2015, 8:52 a.m.) Review request for mesos, Adam B, Cody Maloney, and Till Toenshoff. Bugs: MESOS-2375 https://issues.apache.org/jira/browse/MESOS-2375 Repository: mesos Description --- As a number of tests rely on the checkpointing flag to be false, a few tests had to be adapted. Removed the following test as the tested logic is specific to (old) non-checkpointing slaves: SlaveRecoveryTest.NonCheckpointingSlave: This test checks whether a non-checkpointing slave is not scheduled to a checkpointing framework. It can be removed as all slaves are checkpointing slaves. Diffs - include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa src/slave/slave.cpp f1f210045e6100560f0d26244f9675f4543a5620 src/tests/disk_quota_tests.cpp 9c3a8815c3478535b72888c296a4aa5cda341ba3 src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 src/tests/fault_tolerance_tests.cpp 9ac75b1f601e14a3d3d117775f37a4a48b291dc6 src/tests/gc_tests.cpp deaa6b1b6c32ae6d153229248d7d4f57caa0ebcf src/tests/master_allocator_tests.cpp a432d0207e1a92532a495bf9ad2826414ee4f6f0 src/tests/master_authorization_tests.cpp ff706ed6f8537207b30a548b0ce2121c5df71ab9 src/tests/master_tests.cpp e69348be676a80017062e3abbd15b8008a6009d7 src/tests/master_validation_tests.cpp c8742928a4e93e86ccd0f5a39856a65cfe8eb74f src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 src/tests/partition_tests.cpp bb96aed37861867fbde68445016f0c6e039f3fb4 src/tests/persistent_volume_tests.cpp b617117ade4b487cc06002cfeca76a0486833b20 src/tests/reconciliation_tests.cpp acd70021574b05ab23872add5bdfa4a46b7dfc51 src/tests/slave_recovery_tests.cpp 53adae0118a26e6d25a9ff20c6374cc8e73275b1 src/tests/status_update_manager_tests.cpp 216a22e9f292b4141c8b966dad0f25dbd791c025 Diff: https://reviews.apache.org/r/31539/diff/ Testing --- make check GTEST_BREAK_ON_FAILURE=1 GTEST_SHUFFLE=1 GTEST_REPEAT=50 on OSX (had to exclude some known flaky tests under OSX) Thanks, Joerg Schad
Re: Review Request 31539: Remove the checkpoint variable entirely from slave/flags.hpp.
On March 15, 2015, 2:03 a.m., Adam B wrote: include/mesos/mesos.proto, lines 321-323 https://reviews.apache.org/r/31539/diff/5/?file=894980#file894980line321 Will we be able to remove this flag in 0.23, or will we need to wait for another release cycle for deprecation? Seems like if it was already 'optional' in 0.22, and it was never set/saved as true in 0.22, then we could remove it in 0.23, as an 0.23 slave recovering 0.22 SlaveInfo would be fine. What about an 0.22 slave registering with an 0.23 master, and vice versa? How safe would that be? Maybe we need to change the default to true here? Joerg Schad wrote: Is it ok if I move this discussion to the follow-up Jira as it is not really an issue with this patch? Sure. Definitely not a blocker for this patch, but I would like to see it covered by another JIRA. On March 15, 2015, 2:03 a.m., Adam B wrote: src/tests/master_tests.cpp, line 1928 https://reviews.apache.org/r/31539/diff/5/?file=894989#file894989line1928 Not yours, but could you help out our style update push and s/ // in code next to your changes (not necessarily the entire file)? Joerg Schad wrote: As seemingly there is some discussion whether to do such style fixes or not, I will provide another Patch/Jira to fix - for all tests. Till Toenshoff wrote: I would suggest to not do that. Let me try to explain: This particual RR is rather unusual in that it touches many files. In october of last year, we reached a consesus for style debt fixes; we said that it would be nice if all files touched would also get this style update (dev-list: `Large changes on the codebase due to MESOS-1872`). A. Lets not bundle any style debt fixes with this one at all as it is atypical and not covered by our consesus. B. Lets make two RRs out of it, first fixing all sharp bracket whitspaces on the files to be touched by this RR. Then base this RR on those style fixes. This will still allow quick and easy reviews but also get a big pile of style debt fixes merged into the project. SGTM. On March 15, 2015, 2:03 a.m., Adam B wrote: src/tests/master_tests.cpp, line 2018 https://reviews.apache.org/r/31539/diff/5/?file=894989#file894989line2018 Do we even need to CreateSlaveFlags() here and elsewhere? If you're not setting any non-default flag values or otherwise using the 'slaveFlags' variable, it can be removed, since StartSlave() is the same as StartSlave(CreateSlaveFlags()). Joerg Schad wrote: CreateSlaveFlags also generates a new work_dir assignment, as I restart the slave I woud like to keep the same work_dir. Sure, but StartSlave(None()) still calls cluster.slaves.start(CreateSlaveFlags()), so it's implicit. ``` TryPIDslave::Slave MesosTest::StartSlave( const Optionslave::Flags flags) { return cluster.slaves.start( flags.isNone() ? CreateSlaveFlags() : flags.get()); } ``` - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31539/#review76502 --- On March 16, 2015, 3:07 a.m., Joerg Schad wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31539/ --- (Updated March 16, 2015, 3:07 a.m.) Review request for mesos, Adam B, Cody Maloney, and Till Toenshoff. Bugs: MESOS-2375 https://issues.apache.org/jira/browse/MESOS-2375 Repository: mesos Description --- As a number of tests rely on the checkpointing flag to be false, a few tests had to be adapted. Removed the following test as the tested logic is specific to (old) non-checkpointing slaves: SlaveRecoveryTest.NonCheckpointingSlave: This test checks whether a non-checkpointing slave is not scheduled to a checkpointing framework. It can be removed as all slaves are checkpointing slaves. Diffs - include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 src/tests/disk_quota_tests.cpp 9c3a8815c3478535b72888c296a4aa5cda341ba3 src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 src/tests/fault_tolerance_tests.cpp 9ac75b1f601e14a3d3d117775f37a4a48b291dc6 src/tests/gc_tests.cpp deaa6b1b6c32ae6d153229248d7d4f57caa0ebcf src/tests/master_allocator_tests.cpp a432d0207e1a92532a495bf9ad2826414ee4f6f0 src/tests/master_authorization_tests.cpp ff706ed6f8537207b30a548b0ce2121c5df71ab9 src/tests/master_tests.cpp e69348be676a80017062e3abbd15b8008a6009d7 src/tests/master_validation_tests.cpp
Re: Review Request 32130: Ensured TaskStatus::source field is set for executor status updates.
On March 16, 2015, 3:40 p.m., Adam B wrote: src/slave/slave.cpp, lines 2490-2491 https://reviews.apache.org/r/32130/diff/1/?file=896443#file896443line2490 Why not just use update.mutable_status() instead of making a copy of the StatusUpdate object? We do this in several other places: ``` TaskStatus* status = update.mutable_status(); status-set_source(...); ``` e.g. src/sched/sched.cpp Then we can get rid of all the update_'s Niklas Nielsen wrote: You don't have mutable_status on a const update object :) We would need to unconst the argument then and wouldn't need to copy in the first place. Fair enough. LGTM! - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32130/#review7 --- On March 16, 2015, 4:06 p.m., Alexander Rukletsov wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32130/ --- (Updated March 16, 2015, 4:06 p.m.) Review request for mesos, Niklas Nielsen and Vinod Kone. Bugs: MESOS-2499 https://issues.apache.org/jira/browse/MESOS-2499 Repository: mesos Description --- A status update originating from executor should have the TaskStatus::source field set to TaskStatus::SOURCE_EXECUTOR. Set this field in the slave to be future proof (a future where there will be no executor driver). Previous code has a bug and updated a copy of the update that was not forwarded. Add some checks for source and reason for status updates in existing tests. Diffs - src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 src/tests/slave_tests.cpp a975305430097a8295b4b155e8448572c12bde22 Diff: https://reviews.apache.org/r/32130/diff/ Testing --- make check (Mac OS 10.9.5, CentOS 7.0.1406) Thanks, Alexander Rukletsov
Re: Review Request 32130: Ensured TaskStatus::source field is set for executor status updates.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32130/#review76687 --- Ship it! Ship It! - Adam B On March 16, 2015, 4:06 p.m., Alexander Rukletsov wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32130/ --- (Updated March 16, 2015, 4:06 p.m.) Review request for mesos, Niklas Nielsen and Vinod Kone. Bugs: MESOS-2499 https://issues.apache.org/jira/browse/MESOS-2499 Repository: mesos Description --- A status update originating from executor should have the TaskStatus::source field set to TaskStatus::SOURCE_EXECUTOR. Set this field in the slave to be future proof (a future where there will be no executor driver). Previous code has a bug and updated a copy of the update that was not forwarded. Add some checks for source and reason for status updates in existing tests. Diffs - src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 src/tests/slave_tests.cpp a975305430097a8295b4b155e8448572c12bde22 Diff: https://reviews.apache.org/r/32130/diff/ Testing --- make check (Mac OS 10.9.5, CentOS 7.0.1406) Thanks, Alexander Rukletsov
Re: Review Request 31539: Remove the checkpoint variable entirely from slave/flags.hpp.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31539/#review76502 --- First pass looks good. I've got a couple of questions about the follow-up removal of SlaveInfo.checkpoint, and then some style nits and general cleanup requests. I still need to go through the actual test logic itself to understand those changes, but the core change looks good. include/mesos/mesos.proto https://reviews.apache.org/r/31539/#comment124041 Style nit: s/remove/Remove/ include/mesos/mesos.proto https://reviews.apache.org/r/31539/#comment124043 Will we be able to remove this flag in 0.23, or will we need to wait for another release cycle for deprecation? Seems like if it was already 'optional' in 0.22, and it was never set/saved as true in 0.22, then we could remove it in 0.23, as an 0.23 slave recovering 0.22 SlaveInfo would be fine. What about an 0.22 slave registering with an 0.23 master, and vice versa? How safe would that be? Maybe we need to change the default to true here? include/mesos/mesos.proto https://reviews.apache.org/r/31539/#comment124042 s/Mesos/MESOS/ src/slave/slave.cpp https://reviews.apache.org/r/31539/#comment124045 s/as to be/to be/ src/slave/slave.cpp https://reviews.apache.org/r/31539/#comment124044 Minor: Technically, Delete after 0.23., since you cannot complete the deprecation during 0.23.1 or another point release. src/tests/docker_containerizer_tests.cpp https://reviews.apache.org/r/31539/#comment124046 All of these lines can be removed like you did above, right? src/tests/gc_tests.cpp https://reviews.apache.org/r/31539/#comment124047 s/ / / (remove double-space), and you can also s/the garbage/garbage/ src/tests/master_authorization_tests.cpp https://reviews.apache.org/r/31539/#comment124048 s/ / / (remove double-space) src/tests/master_tests.cpp https://reviews.apache.org/r/31539/#comment124049 s/FLags/flags/ src/tests/master_tests.cpp https://reviews.apache.org/r/31539/#comment124050 s/ // src/tests/master_tests.cpp https://reviews.apache.org/r/31539/#comment124051 Remove the now-unecessary comment, and these default-value initializations (as you did elsewhere). src/tests/master_tests.cpp https://reviews.apache.org/r/31539/#comment124052 Not yours, but could you help out our style update push and s/ // in code next to your changes (not necessarily the entire file)? src/tests/master_tests.cpp https://reviews.apache.org/r/31539/#comment124053 Do we even need to CreateSlaveFlags() here and elsewhere? If you're not setting any non-default flag values or otherwise using the 'slaveFlags' variable, it can be removed, since StartSlave() is the same as StartSlave(CreateSlaveFlags()). src/tests/partition_tests.cpp https://reviews.apache.org/r/31539/#comment124054 Remove the CreateSlaveFlags() line if nothing else uses 'flags'. s/ // src/tests/slave_recovery_tests.cpp https://reviews.apache.org/r/31539/#comment124055 Should be able to remove all of these too, right? Then you can just directly `return ContainerizerTestT::CreateSlaveFlags();` - Adam B On March 14, 2015, 7:10 a.m., Joerg Schad wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31539/ --- (Updated March 14, 2015, 7:10 a.m.) Review request for mesos, Adam B, Cody Maloney, and Till Toenshoff. Bugs: MESOS-2375 https://issues.apache.org/jira/browse/MESOS-2375 Repository: mesos Description --- As a number of tests rely on the checkpointing flag to be false, a few tests had to be adapted. Removed the following test as the tested logic is specific to (old) non-checkpointing slaves: SlaveRecoveryTest.NonCheckpointingSlave: This test checks whether a non-checkpointing slave is not scheduled to a checkpointing framework. It can be removed as all slaves are checkpointing slaves. Diffs - include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 src/tests/disk_quota_tests.cpp 9c3a8815c3478535b72888c296a4aa5cda341ba3 src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 src/tests/fault_tolerance_tests.cpp 9ac75b1f601e14a3d3d117775f37a4a48b291dc6 src/tests/gc_tests.cpp deaa6b1b6c32ae6d153229248d7d4f57caa0ebcf src/tests/master_allocator_tests.cpp a432d0207e1a92532a495bf9ad2826414ee4f6f0 src/tests/master_authorization_tests.cpp ff706ed6f8537207b30a548b0ce2121c5df71ab9 src/tests/master_tests.cpp
Re: Review Request 31539: Remove the checkpoint variable entirely from slave/flags.hpp.
On March 2, 2015, 11:39 a.m., Cody Maloney wrote: src/tests/master_allocator_tests.cpp, line 637 https://reviews.apache.org/r/31539/diff/2/?file=882281#file882281line637 The test shouldn't be deleted, rather updated to make sure that after the timeout of a slave being really considered lost, the resources are removed. Side note: It would probably be good for tasks to have a must report back time seperate from when we consider a slave truly lost. For quick one off-jobs, waiting minutes before retrying the task will be unacceptable latency... Probably some sort of early slave disappeared message to frameworks when we first detect, then a framework can re-launch if needed... Joerg Schad wrote: see comment for above dropped test. Cody: If the Master actually receives an Exited event for this pid, it will mark it as disconnected/deactivated and not offer any resources from it, but during a network disconnect there is no way to know if the slave has exited or if the task is still running. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31539/#review74795 --- On March 14, 2015, 7:10 a.m., Joerg Schad wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31539/ --- (Updated March 14, 2015, 7:10 a.m.) Review request for mesos, Adam B, Cody Maloney, and Till Toenshoff. Bugs: MESOS-2375 https://issues.apache.org/jira/browse/MESOS-2375 Repository: mesos Description --- As a number of tests rely on the checkpointing flag to be false, a few tests had to be adapted. Removed the following test as the tested logic is specific to (old) non-checkpointing slaves: SlaveRecoveryTest.NonCheckpointingSlave: This test checks whether a non-checkpointing slave is not scheduled to a checkpointing framework. It can be removed as all slaves are checkpointing slaves. Diffs - include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 src/tests/disk_quota_tests.cpp 9c3a8815c3478535b72888c296a4aa5cda341ba3 src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 src/tests/fault_tolerance_tests.cpp 9ac75b1f601e14a3d3d117775f37a4a48b291dc6 src/tests/gc_tests.cpp deaa6b1b6c32ae6d153229248d7d4f57caa0ebcf src/tests/master_allocator_tests.cpp a432d0207e1a92532a495bf9ad2826414ee4f6f0 src/tests/master_authorization_tests.cpp ff706ed6f8537207b30a548b0ce2121c5df71ab9 src/tests/master_tests.cpp e69348be676a80017062e3abbd15b8008a6009d7 src/tests/master_validation_tests.cpp c8742928a4e93e86ccd0f5a39856a65cfe8eb74f src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 src/tests/partition_tests.cpp bb96aed37861867fbde68445016f0c6e039f3fb4 src/tests/persistent_volume_tests.cpp b617117ade4b487cc06002cfeca76a0486833b20 src/tests/reconciliation_tests.cpp acd70021574b05ab23872add5bdfa4a46b7dfc51 src/tests/slave_recovery_tests.cpp 53adae0118a26e6d25a9ff20c6374cc8e73275b1 src/tests/status_update_manager_tests.cpp 216a22e9f292b4141c8b966dad0f25dbd791c025 Diff: https://reviews.apache.org/r/31539/diff/ Testing --- make check GTEST_BREAK_ON_FAILURE=1 GTEST_SHUFFLE=1 GTEST_REPEAT=50 on OSX (had to exclude some known flaky tests under OSX) Thanks, Joerg Schad
Re: Review Request 32008: Use LDADD to add unbundled libraries to all command line programs
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32008/#review76458 --- Ship it! LGTM, although you should s/--as-neede/--as-needed/ in the rb description. - Adam B On March 13, 2015, 12:55 p.m., Cody Maloney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32008/ --- (Updated March 13, 2015, 12:55 p.m.) Review request for mesos, Niklas Nielsen and Timothy St. Clair. Bugs: MESOS-2486 https://issues.apache.org/jira/browse/MESOS-2486 Repository: mesos Description --- Use LDADD to add unbundled libraries to all command line programs When building with unbundled libraries, every program needs to link against the libraries, and can't assume it will be linked against because the program links with a program which links against the libraries. This comes up on Debian and Ubuntu[1] which set the flag `-Wl,--no-copy-dt-needed-entries` which forcibly disables the binaries from getting automatically the dependencies of the shared libraries they link against. Also adds -Wl,--as-neede to AM_LDFLAGS to silence warnings in Fedora packaging since we link against more libraries than things actually use at the command line. Debian and Ubuntu default to this option. Fixes MESOS-2486 [1] https://wiki.ubuntu.com/ToolChain/CompilerFlags#A-Wl.2C--no-copy-dt-needed-entries Diffs - src/Makefile.am 3059818231c46484039d179cd6916932eff6cd68 Diff: https://reviews.apache.org/r/32008/diff/ Testing --- make check on Ubuntu 14.04 with `--with-glog`, `--with-protobuf` and `--with-boost`. As a sanity check that all isntances of _LDADD are updated, I ran the grep: `grep -r _LDADD . | grep -v '$(LDADD)'` that all the program-specific LDADD flags now include the global LDADD. Thanks, Cody Maloney
Re: Review Request 31977: Fixed the race in MasterTest.MasterFailoverLongLivedExecutor.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31977/#review76218 --- Ship it! Looks like a solid fix. Is there a JIRA associated with this patch? src/tests/master_tests.cpp https://reviews.apache.org/r/31977/#comment123701 I found a couple other MATCHERs in 3rdparty/libprocess/include/process/gmock.hpp - Adam B On March 12, 2015, 12:53 a.m., Michael Park wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31977/ --- (Updated March 12, 2015, 12:53 a.m.) Review request for mesos, Adam B and Niklas Nielsen. Repository: mesos Description --- See summary. Diffs - src/tests/master_tests.cpp e69348be676a80017062e3abbd15b8008a6009d7 Diff: https://reviews.apache.org/r/31977/diff/ Testing --- 1. `GTEST_FILTER=MasterTest.MasterFailoverLongLivedExecutor GLOG_v=1 ./bin/mesos-tests.sh --gtest_repeat=-1` up to 500+ iterations 2. `make check -j 8` 3. `./bin/mesos-tests.sh --gtest_shuffle` Thanks, Michael Park
Re: Review Request 31889: Updated Hooks to fix failure during master failover.
On March 11, 2015, 1:02 a.m., Adam B wrote: src/tests/hook_tests.cpp, line 199 https://reviews.apache.org/r/31889/diff/6/?file=891026#file891026line199 Does this being a HookTest mean that it's guaranteed to use the FOO=bar test executorEnvironmentDecorator hook? Kapil Arya wrote: I am not sure if I understand your question here. The test hook sets FOO=bar in the slaveExecutorEnvironmentDecorator and that's what we verify here. This test (along with other tests) is tightly coupled with the test hook module. Ok. Just wanted to make sure that this test (and other HookTests, I guess) guarantees that the test hook is loaded, without any special build/command-line parameters to make check. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31889/#review76049 --- On March 11, 2015, 12:08 p.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31889/ --- (Updated March 11, 2015, 12:08 p.m.) Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Till Toenshoff. Bugs: MESOS-2463 https://issues.apache.org/jira/browse/MESOS-2463 Repository: mesos Description --- The ExecutorInfo is immutable, and so the environment decoration is deferred until containerizer launch. The new slaveExecutorHook simply notifies the hook module of a new executor being launched. The environment decorator hook for containerizer launch update it's environment variables as gathered from the hook modules. Diffs - include/mesos/hook.hpp d83ace576a2c78eb7b1e910d89d912f6df5c46ef src/examples/test_hook_module.cpp 22212261053f2c10f7c4e5ae15aa3d4a2aa1c051 src/hook/manager.hpp d3729eaca1bf2893cbc94cf62835dafb2d9a22a1 src/hook/manager.cpp 3fd9d5e7f81e3d0ca2aca4091beba4ae555ae7e7 src/slave/containerizer/containerizer.cpp 33f8738fa229b42ebacfd75b762c977b33191b65 src/slave/slave.cpp 71ae84bbfcef208cc2ee603f3c8a79225e48a7d5 src/tests/hook_tests.cpp b5d776a55b6b205394469c176ac6be6702c218f3 Diff: https://reviews.apache.org/r/31889/diff/ Testing --- make check. The master failover tests have not been performed yet. Thanks, Kapil Arya
Re: Review Request 31889: Updated Hooks to fix failure during master failover.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31889/#review76172 --- Ship it! Ship It! - Adam B On March 11, 2015, 4:42 p.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31889/ --- (Updated March 11, 2015, 4:42 p.m.) Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Till Toenshoff. Bugs: MESOS-2463 https://issues.apache.org/jira/browse/MESOS-2463 Repository: mesos Description --- The ExecutorInfo is immutable, and so the environment decoration is deferred until containerizer launch. The new slaveExecutorHook simply notifies the hook module of a new executor being launched. The environment decorator hook for containerizer launch update it's environment variables as gathered from the hook modules. Diffs - include/mesos/hook.hpp d83ace576a2c78eb7b1e910d89d912f6df5c46ef src/examples/test_hook_module.cpp 22212261053f2c10f7c4e5ae15aa3d4a2aa1c051 src/hook/manager.hpp d3729eaca1bf2893cbc94cf62835dafb2d9a22a1 src/hook/manager.cpp 3fd9d5e7f81e3d0ca2aca4091beba4ae555ae7e7 src/slave/containerizer/containerizer.cpp 33f8738fa229b42ebacfd75b762c977b33191b65 src/slave/slave.cpp 71ae84bbfcef208cc2ee603f3c8a79225e48a7d5 src/tests/hook_tests.cpp b5d776a55b6b205394469c176ac6be6702c218f3 Diff: https://reviews.apache.org/r/31889/diff/ Testing --- make check. Also tried a negative test by replacing `test $FOO = 'bar'` with `test $FOO = 'baz'` and surely, the test failed. Thanks, Kapil Arya
Re: Review Request 31889: Updated Hooks to fix failure during master failover.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31889/#review76049 --- Looks good. Just a couple of questions and some style nits in the new test. src/hook/manager.cpp https://reviews.apache.org/r/31889/#comment123441 What should you do if result.isNone()? Just ignore without warning (as you're already doing here)? src/tests/hook_tests.cpp https://reviews.apache.org/r/31889/#comment123448 Does this being a HookTest mean that it's guaranteed to use the FOO=bar test executorEnvironmentDecorator hook? src/tests/hook_tests.cpp https://reviews.apache.org/r/31889/#comment123442 src/tests/hook_tests.cpp https://reviews.apache.org/r/31889/#comment123443 src/tests/hook_tests.cpp https://reviews.apache.org/r/31889/#comment123444 .Times(1) is implicit, can be removed. src/tests/hook_tests.cpp https://reviews.apache.org/r/31889/#comment123445 src/tests/hook_tests.cpp https://reviews.apache.org/r/31889/#comment123449 Where do you actually validate the return value here? Is it just that you're expecting TASK_FINISHED instead of TASK_FAILED? src/tests/hook_tests.cpp https://reviews.apache.org/r/31889/#comment123446 s/MergeFrom/CopyFrom/g src/tests/hook_tests.cpp https://reviews.apache.org/r/31889/#comment123447 Could use the CREATE_EXECUTOR_INFO() + LaunchTasks() model instead of building up a TaskInfo yourself. - Adam B On March 10, 2015, 7:15 p.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31889/ --- (Updated March 10, 2015, 7:15 p.m.) Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Till Toenshoff. Bugs: MESOS-2463 https://issues.apache.org/jira/browse/MESOS-2463 Repository: mesos Description --- The ExecutorInfo is immutable, and so the environment decoration is deferred until containerizer launch. The new slaveExecutorHook simply notifies the hook module of a new executor being launched. The environment decorator hook for containerizer launch update it's environment variables as gathered from the hook modules. Diffs - include/mesos/hook.hpp d83ace576a2c78eb7b1e910d89d912f6df5c46ef src/examples/test_hook_module.cpp 8faf6850aafcebda7e9f0d1b735d61f7effa842d src/hook/manager.hpp d3729eaca1bf2893cbc94cf62835dafb2d9a22a1 src/hook/manager.cpp 3fd9d5e7f81e3d0ca2aca4091beba4ae555ae7e7 src/slave/containerizer/containerizer.cpp 33f8738fa229b42ebacfd75b762c977b33191b65 src/slave/slave.cpp 364d911b086dfe1f15f76aa3888f99146aa8d876 src/tests/hook_tests.cpp f4b4f519456dc00a8894c7ce154b28a7ab9ce493 Diff: https://reviews.apache.org/r/31889/diff/ Testing --- make check. The master failover tests have not been performed yet. Thanks, Kapil Arya
Re: Review Request 31324: Updated changelog for 0.22.0
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31324/#review76203 --- Ship it! Looks more or less good to go. I just want to call your attention to a few features that could use references to further documentation, and a couple of fixes/features that might warrant further note. CHANGELOG https://reviews.apache.org/r/31324/#comment123681 Any doc to point to with more info? CHANGELOG https://reviews.apache.org/r/31324/#comment123682 Again, doc reference? s/;/:/ CHANGELOG https://reviews.apache.org/r/31324/#comment123683 Also in ExecutorInfo, but that's probably mentioned in the framework development guide. CHANGELOG https://reviews.apache.org/r/31324/#comment123684 Where should users look for this information now? The JIRA leads me to believe '/metrics/snapshot', but we need to have a clear answer. Is there further documentation on the metrics endpoints? CHANGELOG https://reviews.apache.org/r/31324/#comment123685 Possibly an API change, since custom resource types will now show up in slave's state.json, but not likely to break anything. CHANGELOG https://reviews.apache.org/r/31324/#comment123686 Another potential API change/fix, for reconciliation status updates. This one feels even less likely to surprise anybody CHANGELOG https://reviews.apache.org/r/31324/#comment123687 Do we want to call out libevent support as a new feature? Or go silent until SSL can be enabled? CHANGELOG https://reviews.apache.org/r/31324/#comment123689 Are we leaving DiskInfo (Persistent Volumes) in silent mode as well, presumably due to a few missing pieces? CHANGELOG https://reviews.apache.org/r/31324/#comment123688 Similarly, do we want to call out the new Accept(offerIds, Operation) API, or wait until DiskInfo/DynamicReservations are ready before we try and push people off of the old LaunchTasks API? - Adam B On March 11, 2015, 5:28 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31324/ --- (Updated March 11, 2015, 5:28 p.m.) Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone. Repository: mesos Description --- Updated changelog for Mesos 0.22.0 Diffs - CHANGELOG 2a54f08b7bb9cc6983631268eafec8cb7d166d97 Diff: https://reviews.apache.org/r/31324/diff/ Testing --- Thanks, Niklas Nielsen
Re: Review Request 31889: Updated Hooks to fix failure during master failover.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31889/#review76158 --- More TODO comment cleanup src/slave/containerizer/containerizer.cpp https://reviews.apache.org/r/31889/#comment123641 Not sure this TODO is accurate/necessary anymore. This patch assumes that the decorator hook returns the entire env from executorInfo as well as its own modifications, so it's up to the hook to decide how to resolve conflicts. On the other hand, if the hook incorrectly returns only its own environment, then the executorInfo.environment variables will be silently dropped. src/slave/containerizer/containerizer.cpp https://reviews.apache.org/r/31889/#comment123637 Redundant to pass/to be passed. Either s/to pass/for/ or s/to be passed// - Adam B On March 11, 2015, 3:51 p.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31889/ --- (Updated March 11, 2015, 3:51 p.m.) Review request for mesos, Adam B, Ben Mahler, Niklas Nielsen, and Till Toenshoff. Bugs: MESOS-2463 https://issues.apache.org/jira/browse/MESOS-2463 Repository: mesos Description --- The ExecutorInfo is immutable, and so the environment decoration is deferred until containerizer launch. The new slaveExecutorHook simply notifies the hook module of a new executor being launched. The environment decorator hook for containerizer launch update it's environment variables as gathered from the hook modules. Diffs - include/mesos/hook.hpp d83ace576a2c78eb7b1e910d89d912f6df5c46ef src/examples/test_hook_module.cpp 22212261053f2c10f7c4e5ae15aa3d4a2aa1c051 src/hook/manager.hpp d3729eaca1bf2893cbc94cf62835dafb2d9a22a1 src/hook/manager.cpp 3fd9d5e7f81e3d0ca2aca4091beba4ae555ae7e7 src/slave/containerizer/containerizer.cpp 33f8738fa229b42ebacfd75b762c977b33191b65 src/slave/slave.cpp 71ae84bbfcef208cc2ee603f3c8a79225e48a7d5 src/tests/hook_tests.cpp b5d776a55b6b205394469c176ac6be6702c218f3 Diff: https://reviews.apache.org/r/31889/diff/ Testing --- make check. Also tried a negative test by replacing `test $FOO = 'bar'` with `test $FOO = 'baz'` and surely, the test failed. Thanks, Kapil Arya
Re: Review Request 31961: Split cram_md5 authenticator into declaration and implementation without functional changes.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31961/#review76205 --- Ship it! LGTM - Adam B On March 11, 2015, 5:02 p.m., Till Toenshoff wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31961/ --- (Updated March 11, 2015, 5:02 p.m.) Review request for mesos, Adam B and Vinod Kone. Bugs: MESOS-2050 https://issues.apache.org/jira/browse/MESOS-2050 Repository: mesos-incubating Description --- see summary. Diffs - src/Makefile.am 3059818 src/authentication/cram_md5/authenticator.hpp c6f465f src/authentication/cram_md5/authenticator.cpp PRE-CREATION Diff: https://reviews.apache.org/r/31961/diff/ Testing --- make check Thanks, Till Toenshoff
Re: Review Request 31960: Added concurrency protection within the SASL auxprop plugin code.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31960/#review76206 --- Ship it! Ship It! - Adam B On March 11, 2015, 4:06 p.m., Till Toenshoff wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31960/ --- (Updated March 11, 2015, 4:06 p.m.) Review request for mesos, Adam B and Vinod Kone. Bugs: MESOS-2050 https://issues.apache.org/jira/browse/MESOS-2050 Repository: mesos-incubating Description --- see summary. Diffs - src/authentication/cram_md5/auxprop.hpp b894386 src/authentication/cram_md5/auxprop.cpp cf503a2 Diff: https://reviews.apache.org/r/31960/diff/ Testing --- make check Thanks, Till Toenshoff
Re: Review Request 27760: Revised authenticator interface to allow for two fold implementations.
On March 11, 2015, 2:42 p.m., Vinod Kone wrote: src/master/master.cpp, lines 3858-3861 https://reviews.apache.org/r/27760/diff/23/?file=890575#file890575line3858 Send a FrameworkError message (instead of AuthenticationError) here to avoid retries? Till Toenshoff wrote: The `AuthenticationErrorMessage` is sent to the AuthenticateeProcess, not the process to be authenticated (slave / framework). The behaviour on this specific event (failed to init the authenticator) has not changed. I believe we commented on this before and agreed to come up with a cleanup on authentication result handling later, in a follow-up-patch. Do I remember correctly here? Let's create a separate JIRA so that authenticatees (or the owning framework/slave) can know from the authentication response (error/failed) whether to quit or retry. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27760/#review76090 --- On March 11, 2015, 5:32 p.m., Till Toenshoff wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27760/ --- (Updated March 11, 2015, 5:32 p.m.) Review request for mesos, Adam B, Kapil Arya, Niklas Nielsen, and Vinod Kone. Bugs: MESOS-2050 https://issues.apache.org/jira/browse/MESOS-2050 Repository: mesos Description --- The initial design and implementation of the authenticator module interface caused issues and was not optimal for heavy lifting setup of external dependencies. By introducing a two fold design, this has been decoupled from the authentication message processing. The new design also gets us back on track to the goal of makeing SASL a soft dependency of mesos. Diffs - include/mesos/authentication/authenticator.hpp f66217a src/authentication/cram_md5/authenticator.hpp c6f465f src/authentication/cram_md5/authenticator.cpp PRE-CREATION src/authentication/cram_md5/auxprop.hpp b894386 src/master/master.hpp 3c957ab src/master/master.cpp dccd7c6 src/tests/cram_md5_authentication_tests.cpp 92a89c5 Diff: https://reviews.apache.org/r/27760/diff/ Testing --- make check Thanks, Till Toenshoff
Re: Review Request 31889: Updated Hooks to fix failure during master failover.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31889/#review76029 --- Looks good to me, barring my question about giving the hook access to the TaskInfo/labels. I think it's ready to share with the rest of the community. include/mesos/hook.hpp https://reviews.apache.org/r/31889/#comment123413 What happens to conflicting changes? Overwritten, prepended, appended? Seems like it's up to the hook; maybe worth mentioning? include/mesos/hook.hpp https://reviews.apache.org/r/31889/#comment123417 Will the hook still be able to read labels or the rest of the TaskInfo to make decisions about what to place in the environment? src/examples/test_hook_module.cpp https://reviews.apache.org/r/31889/#comment123416 Does this TODO need to be addressed in this review, i.e. is it a blocker for 0.22? src/slave/containerizer/containerizer.cpp https://reviews.apache.org/r/31889/#comment123418 Hook overrides any conflicting values? I guess you're passing in the input environment, so the hook can choose to prepend/append when creating a variable's new value. - Adam B On March 10, 2015, 6:11 p.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31889/ --- (Updated March 10, 2015, 6:11 p.m.) Review request for mesos, Niklas Nielsen and Till Toenshoff. Bugs: MESOS-2463 https://issues.apache.org/jira/browse/MESOS-2463 Repository: mesos Description --- The ExecutorInfo is immutable, and so the environment decoration is deferred until containerizer launch. The new slaveExecutorHook simply notifies the hook module of a new executor being launched. The environment decorator hook for containerizer launch update it's environment variables as gathered from the hook modules. Diffs - include/mesos/hook.hpp d83ace576a2c78eb7b1e910d89d912f6df5c46ef src/examples/test_hook_module.cpp 8faf6850aafcebda7e9f0d1b735d61f7effa842d src/hook/manager.hpp d3729eaca1bf2893cbc94cf62835dafb2d9a22a1 src/hook/manager.cpp 3fd9d5e7f81e3d0ca2aca4091beba4ae555ae7e7 src/slave/containerizer/containerizer.cpp 33f8738fa229b42ebacfd75b762c977b33191b65 src/slave/slave.cpp 364d911b086dfe1f15f76aa3888f99146aa8d876 Diff: https://reviews.apache.org/r/31889/diff/ Testing --- make check. The master failover tests have not been performed yet. Thanks, Kapil Arya
Re: Review Request 31838: Fixed authentication failure triggered slave crash.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31838/#review75666 --- Looks like a good fix. Any chance you can fashion a test or two for this behavior? - Adam B On March 8, 2015, 5:54 p.m., Till Toenshoff wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31838/ --- (Updated March 8, 2015, 5:54 p.m.) Review request for mesos, Adam B, Alexander Rojas, Joerg Schad, and Vinod Kone. Bugs: MESOS-2464 https://issues.apache.org/jira/browse/MESOS-2464 Repository: mesos-incubating Description --- see summary. Diffs - src/slave/slave.cpp 364d911 Diff: https://reviews.apache.org/r/31838/diff/ Testing --- make check functional testing with failed authentication Thanks, Till Toenshoff
Re: Review Request 27760: Revised authenticator interface to allow for two fold implementations.
On Feb. 20, 2015, 2:19 p.m., Vinod Kone wrote: src/master/master.cpp, lines 3897-3898 https://reviews.apache.org/r/27760/diff/17/?file=869282#file869282line3897 Why do we need this 'onAny'? Why can't 'authenticated' be set in __authenticate()? Till Toenshoff wrote: `__authenticated` is needed to capture (at least) a discard on the outer future. I have rearranged the code of both continuations to be more explicit about that. Vinod Kone wrote: A Future::discard() *does not* result in a future being DISCARDED (and onAny callbacks being called). You want an onDiscard() handler (instead of onAny) to capture that case. A Future::discard() does not result in a future being DISCARDED That sounds strange. What does it do then? Is an onDiscard() handler going to be sufficient here? On Feb. 20, 2015, 2:19 p.m., Vinod Kone wrote: src/master/master.cpp, lines 3952-3954 https://reviews.apache.org/r/27760/diff/17/?file=869282#file869282line3952 Can you rephrase these log messages to something meaninguful when one looks at master log? Till Toenshoff wrote: How about this? ``` if (authenticate.isDiscarded()) { LOG(WARNING) Authentication for pid has pending cancel request; } else if (authenticate.discard()) { LOG(WARNING) Requested to cancel authentication for pid; } ``` Adam B wrote: How about: Tried to cancel authentication session for [pid], but it was already cancelled and Cancelling authentication session for PID? Vinod Kone wrote: How are you cancelling the authentication here? The intention was that discarding the authenticate future would trigger the (now onDiscard) handler, which would end the authentication session. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27760/#review73310 --- On March 5, 2015, 11:22 a.m., Till Toenshoff wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27760/ --- (Updated March 5, 2015, 11:22 a.m.) Review request for mesos, Adam B, Kapil Arya, Niklas Nielsen, and Vinod Kone. Bugs: MESOS-2050 https://issues.apache.org/jira/browse/MESOS-2050 Repository: mesos Description --- The initial design and implementation of the authenticator module interface caused issues and was not optimal for heavy lifting setup of external dependencies. By introducing a two fold design, this has been decoupled from the authentication message processing. The new design also gets us back on track to the goal of makeing SASL a soft dependency of mesos. Diffs - include/mesos/authentication/authenticator.hpp f66217a src/Makefile.am d299f07 src/authentication/cram_md5/authenticator.hpp c6f465f src/authentication/cram_md5/authenticator.cpp PRE-CREATION src/authentication/cram_md5/auxprop.hpp b894386 src/authentication/cram_md5/auxprop.cpp cf503a2 src/master/master.hpp 3c957ab src/master/master.cpp 68ca19a src/tests/cram_md5_authentication_tests.cpp 92a89c5 Diff: https://reviews.apache.org/r/27760/diff/ Testing --- make check Thanks, Till Toenshoff
Re: Review Request 31324: Updated changelog for 0.22.0
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31324/#review75312 --- CHANGELOG https://reviews.apache.org/r/31324/#comment122301 s/develop/development/ CHANGELOG https://reviews.apache.org/r/31324/#comment122302 Sort CHANGELOG https://reviews.apache.org/r/31324/#comment122303 Sort - Adam B On March 4, 2015, 3:08 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31324/ --- (Updated March 4, 2015, 3:08 p.m.) Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone. Repository: mesos Description --- Updated changelog for Mesos 0.22.0 Diffs - CHANGELOG 2a54f08b7bb9cc6983631268eafec8cb7d166d97 Diff: https://reviews.apache.org/r/31324/diff/ Testing --- Thanks, Niklas Nielsen
Re: Review Request 27760: Revised authenticator interface to allow for two fold implementations.
On Feb. 25, 2015, 4:01 p.m., Niklas Nielsen wrote: Hey Till, Can you rebase this? :) Niklas Nielsen wrote: Vinod, think that this is only awaiting responses from you on a few questions/answers above :) Would love to get this in. @vinodkone Ready for you to take another look regarding your open issues. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27760/#review74158 --- On Feb. 26, 2015, 6:28 p.m., Till Toenshoff wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27760/ --- (Updated Feb. 26, 2015, 6:28 p.m.) Review request for mesos, Adam B, Kapil Arya, Niklas Nielsen, and Vinod Kone. Bugs: MESOS-2050 https://issues.apache.org/jira/browse/MESOS-2050 Repository: mesos Description --- The initial design and implementation of the authenticator module interface caused issues and was not optimal for heavy lifting setup of external dependencies. By introducing a two fold design, this has been decoupled from the authentication message processing. The new design also gets us back on track to the goal of makeing SASL a soft dependency of mesos. Diffs - include/mesos/authentication/authenticator.hpp f66217a src/Makefile.am 17d0d7a src/authentication/cram_md5/authenticator.hpp c6f465f src/authentication/cram_md5/authenticator.cpp PRE-CREATION src/authentication/cram_md5/auxprop.hpp b894386 src/authentication/cram_md5/auxprop.cpp cf503a2 src/master/master.hpp e288cdb src/master/master.cpp 76e217d src/tests/cram_md5_authentication_tests.cpp 92a89c5 Diff: https://reviews.apache.org/r/27760/diff/ Testing --- make check Thanks, Till Toenshoff
Re: Review Request 31362: Reenabled hadoop_home and frameworks_home slave flags for mesos-fetcher and added tests
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31362/#review74250 --- Ship it! Ship It! - Adam B On Feb. 26, 2015, 12:16 a.m., Bernd Mathiske wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31362/ --- (Updated Feb. 26, 2015, 12:16 a.m.) Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen. Bugs: MESOS-2390 https://issues.apache.org/jira/browse/MESOS-2390 Repository: mesos Description --- Now the containerizer/fetcher sets the HADOOP_HOME variable for mesos-fetcher again if the slave flag hadoop_home is set. Added a test that checks that HDFS fetching is not broken and also ensures that the flag gets translated to the environment variable and then gets applied in mesos-fetcher. Created a mock hadoop implementation script for this. This script has the exact same side effects as a real haddop client in the scope of our testing. Using this, Mesos testing has no extra external dependencies (on Hadoop). Slave flag frameworks_home does not need to be an evironment variable. It is now part of FetcherInfo only, but it also gets tested now that it works. Diffs - include/mesos/fetcher/fetcher.proto facb87b92bf3194516f636dcc348e136af537721 src/launcher/fetcher.cpp fed0105946da579a38357a30e7ae56e646e05b89 src/slave/containerizer/fetcher.cpp d290f95251def3952c5ee34f600e1d71467f6293 src/tests/fetcher_tests.cpp 2438620e2db94c3c56336fa5d8e69a18fe8f3bac Diff: https://reviews.apache.org/r/31362/diff/ Testing --- make check Thanks, Bernd Mathiske
Re: Review Request 31362: Reenabled hadoop_home and frameworks_home slave flags for mesos-fetcher and added tests
On Feb. 26, 2015, 2:01 a.m., Adam B wrote: Ship It! Niklas Nielsen wrote: What happened here? This is blocking 0.22.0. Bernd, are you on top of the flaky test? Adam B wrote: The latest revision (4) fixed the broken test by writing the script into the tests tmpdir rather than trying to find it in src/tests. Should be ready to commit now, unless somebody objects. Committed. @bernd-mesos can you mark this review as Submitted? - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31362/#review74250 --- On Feb. 26, 2015, 12:16 a.m., Bernd Mathiske wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31362/ --- (Updated Feb. 26, 2015, 12:16 a.m.) Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen. Bugs: MESOS-2390 https://issues.apache.org/jira/browse/MESOS-2390 Repository: mesos Description --- Now the containerizer/fetcher sets the HADOOP_HOME variable for mesos-fetcher again if the slave flag hadoop_home is set. Added a test that checks that HDFS fetching is not broken and also ensures that the flag gets translated to the environment variable and then gets applied in mesos-fetcher. Created a mock hadoop implementation script for this. This script has the exact same side effects as a real haddop client in the scope of our testing. Using this, Mesos testing has no extra external dependencies (on Hadoop). Slave flag frameworks_home does not need to be an evironment variable. It is now part of FetcherInfo only, but it also gets tested now that it works. Diffs - include/mesos/fetcher/fetcher.proto facb87b92bf3194516f636dcc348e136af537721 src/launcher/fetcher.cpp fed0105946da579a38357a30e7ae56e646e05b89 src/slave/containerizer/fetcher.cpp d290f95251def3952c5ee34f600e1d71467f6293 src/tests/fetcher_tests.cpp 2438620e2db94c3c56336fa5d8e69a18fe8f3bac Diff: https://reviews.apache.org/r/31362/diff/ Testing --- make check Thanks, Bernd Mathiske
Re: Review Request 31390: Added discovery info documentation.
On Feb. 25, 2015, 11:08 a.m., Adam B wrote: Minor markdown tweaks, but I'll fix those myself and commit this. Submitted. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31390/#review74051 --- On Feb. 25, 2015, 9:15 a.m., Christos Kozyrakis wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31390/ --- (Updated Feb. 25, 2015, 9:15 a.m.) Review request for mesos, Connor Doyle and Niklas Nielsen. Bugs: MESOS-2396 https://issues.apache.org/jira/browse/MESOS-2396 Repository: mesos Description --- Added discovery info documentation. Diffs - docs/app-framework-development-guide.md dd7603d099f7582fa8fe93a4c3daa521b23f6223 Diff: https://reviews.apache.org/r/31390/diff/ Testing --- Thanks, Christos Kozyrakis
Re: Review Request 31390: Added discovery info documentation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31390/#review74051 --- Ship it! Minor markdown tweaks, but I'll fix those myself and commit this. docs/app-framework-development-guide.md https://reviews.apache.org/r/31390/#comment120554 s/`location field`/`location` field/ docs/app-framework-development-guide.md https://reviews.apache.org/r/31390/#comment120564 s/name field/`name` field/ - Adam B On Feb. 25, 2015, 9:15 a.m., Christos Kozyrakis wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31390/ --- (Updated Feb. 25, 2015, 9:15 a.m.) Review request for mesos, Connor Doyle and Niklas Nielsen. Bugs: MESOS-2396 https://issues.apache.org/jira/browse/MESOS-2396 Repository: mesos Description --- Added discovery info documentation. Diffs - docs/app-framework-development-guide.md dd7603d099f7582fa8fe93a4c3daa521b23f6223 Diff: https://reviews.apache.org/r/31390/diff/ Testing --- Thanks, Christos Kozyrakis
Re: Review Request 31390: Added discovery info documentation.
On Feb. 25, 2015, 11:08 a.m., Adam B wrote: Minor markdown tweaks, but I'll fix those myself and commit this. Adam B wrote: Submitted. Ben Mahler wrote: Adam: Did you forget to mark this review as submitted? Unfortunately, I don't have permission to mark others reviews as submitted (I did before one of the RB upgrades). @kozyraki can you mark it as submitted? - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31390/#review74051 --- On Feb. 25, 2015, 11:16 a.m., Christos Kozyrakis wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31390/ --- (Updated Feb. 25, 2015, 11:16 a.m.) Review request for mesos, Adam B, Connor Doyle, and Niklas Nielsen. Bugs: MESOS-2396 https://issues.apache.org/jira/browse/MESOS-2396 Repository: mesos Description --- Added discovery info documentation. Diffs - docs/app-framework-development-guide.md dd7603d099f7582fa8fe93a4c3daa521b23f6223 Diff: https://reviews.apache.org/r/31390/diff/ Testing --- Thanks, Christos Kozyrakis
Re: Review Request 31362: Reenabled hadoop_home and frameworks_home slave flags for mesos-fetcher and added tests
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31362/#review74005 --- Ship it! LGTM, as long as you can get ReviewBot to pass your HdfsURI unit test. src/tests/fetcher_tests.cpp https://reviews.apache.org/r/31362/#comment120451 Looks like ReviewBot disagrees with this setup. ../../src/tests/fetcher_tests.cpp:720: Failure Value of: os::exists(scriptPath) Actual: false Expected: true [ FAILED ] FetcherTest.HdfsURI (0 ms) Have you investigated this failure? - Adam B On Feb. 25, 2015, 1:27 a.m., Bernd Mathiske wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31362/ --- (Updated Feb. 25, 2015, 1:27 a.m.) Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen. Bugs: MESOS-2390 https://issues.apache.org/jira/browse/MESOS-2390 Repository: mesos Description --- Now the containerizer/fetcher sets the HADOOP_HOME variable for mesos-fetcher again if the slave flag hadoop_home is set. Added a test that checks that HDFS fetching is not broken and also ensures that the flag gets translated to the environment variable and then gets applied in mesos-fetcher. Created a mock hadoop implementation script for this. This script has the exact same side effects as a real haddop client in the scope of our testing. Using this, Mesos testing has no extra external dependencies (on Hadoop). Slave flag frameworks_home does not need to be an evironment variable. It is now part of FetcherInfo only, but it also gets tested now that it works. Diffs - include/mesos/fetcher/fetcher.proto facb87b92bf3194516f636dcc348e136af537721 src/launcher/fetcher.cpp fed0105946da579a38357a30e7ae56e646e05b89 src/slave/containerizer/fetcher.cpp d290f95251def3952c5ee34f600e1d71467f6293 src/tests/fetcher_tests.cpp 2438620e2db94c3c56336fa5d8e69a18fe8f3bac src/tests/mock_hadoop.sh PRE-CREATION Diff: https://reviews.apache.org/r/31362/diff/ Testing --- make check Thanks, Bernd Mathiske
Re: Review Request 31324: Updated changelog for 0.22.0
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31324/#review74000 --- Needs to be sorted, and updated with the new rc2 JIRAs. CHANGELOG https://reviews.apache.org/r/31324/#comment120439 Remove Adds and just start with Support, since This release contains... support for x. Even Support for is weak/unnecessary. You can even merge it with the following sentence to read Rate limiting slave removal, to safeguard against unforeseen bugs... CHANGELOG https://reviews.apache.org/r/31324/#comment120440 Remove all these enables/introduces/adds/adds starting verbs and go with strong feature-title nouns. For the new flags, I'd go with New '--foo_bar' flag to... CHANGELOG https://reviews.apache.org/r/31324/#comment120441 s/ui/UI/ CHANGELOG https://reviews.apache.org/r/31324/#comment120442 Insert in sorted order with the rest of the API Changes CHANGELOG https://reviews.apache.org/r/31324/#comment120443 Sorted order for each section CHANGELOG https://reviews.apache.org/r/31324/#comment120444 No need for double-blank line here. We usually reserve those for between releases. CHANGELOG https://reviews.apache.org/r/31324/#comment120445 When did we start using Technical task in JIRA? Maybe worth collapsing these into the Task section? Or does JIRA just auto-generate the list for you? CHANGELOG https://reviews.apache.org/r/31324/#comment120447 Really more of a Documentation change, even if it wasn't marked as such. CHANGELOG https://reviews.apache.org/r/31324/#comment120448 This should now include the new disk isolation and service discovery doc JIRAs - Adam B On Feb. 24, 2015, 5:08 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31324/ --- (Updated Feb. 24, 2015, 5:08 p.m.) Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone. Repository: mesos Description --- Updated changelog for Mesos 0.22.0 Diffs - CHANGELOG 2a54f08b7bb9cc6983631268eafec8cb7d166d97 Diff: https://reviews.apache.org/r/31324/diff/ Testing --- Thanks, Niklas Nielsen
Re: Review Request 31390: Added discovery info documentation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31390/#review73997 --- Ship it! Looks great, although potentially more verbose than necessary. docs/app-framework-development-guide.md https://reviews.apache.org/r/31390/#comment120436 Do you really need to bother showing the definition of Port[s] here? You don't copy the definition of Labels, and that's another DiscoveryInfo member type. Anybody getting that deep into it can/will go directly to the protobuf definitions anyway. docs/app-framework-development-guide.md https://reviews.apache.org/r/31390/#comment120435 Do you really need to include this lengthy comment if you're copying its contents below? Once is enough. Your choice which one. - Adam B On Feb. 24, 2015, 9:38 p.m., Christos Kozyrakis wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31390/ --- (Updated Feb. 24, 2015, 9:38 p.m.) Review request for mesos, Connor Doyle and Niklas Nielsen. Bugs: MESOS-2396 https://issues.apache.org/jira/browse/MESOS-2396 Repository: mesos Description --- Added discovery info documentation. Diffs - docs/app-framework-development-guide.md dd7603d099f7582fa8fe93a4c3daa521b23f6223 Diff: https://reviews.apache.org/r/31390/diff/ Testing --- Thanks, Christos Kozyrakis
Re: Review Request 31324: Updated changelog for 0.22.0
On Feb. 23, 2015, 2:02 p.m., Michael Park wrote: CHANGELOG, line 156 https://reviews.apache.org/r/31324/diff/1/?file=873132#file873132line156 This one was reverted, we're going to use ACLs and principals instead to achieve this. Let's keep it out. Reopened and removed 0.22 Fixed version. Not sure if Niklas wants to regenerate the list to verify. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31324/#review73681 --- On Feb. 24, 2015, 5:08 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31324/ --- (Updated Feb. 24, 2015, 5:08 p.m.) Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone. Repository: mesos Description --- Updated changelog for Mesos 0.22.0 Diffs - CHANGELOG 2a54f08b7bb9cc6983631268eafec8cb7d166d97 Diff: https://reviews.apache.org/r/31324/diff/ Testing --- Thanks, Niklas Nielsen
Re: Review Request 31325: Updated upgrade guide
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31325/#review73852 --- docs/upgrades.md https://reviews.apache.org/r/31325/#comment120263 Maybe note that frameworks must still enable it in their frameworkInfo to take advantage of checkpointing their tasks. - Adam B On Feb. 23, 2015, 1:33 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31325/ --- (Updated Feb. 23, 2015, 1:33 p.m.) Review request for mesos, Benjamin Hindman, Ben Mahler, Kapil Arya, and Vinod Kone. Repository: mesos Description --- Remove 'WIP' from upgrade guide. Also, this patch can introduce additional upgrade steps we may have missed (based on your feedback). Diffs - docs/upgrades.md 07d19f91127d60656086ddbb9a1e0357c3410764 Diff: https://reviews.apache.org/r/31325/diff/ Testing --- Thanks, Niklas Nielsen
Re: Review Request 31327: Updated configuration doc.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31327/#review73846 --- docs/configuration.md https://reviews.apache.org/r/31327/#comment120259 or /path/to/file? And shouldn't you end the `/code` earlier, if it's only wrapping the path(s)? Also, pre is closed but never opened. docs/configuration.md https://reviews.apache.org/r/31327/#comment120260 Above you eliminated the (default: crammd5) text, presumably because it is inline in the description. Either way is fine by me, but please be consistent between the two. - Adam B On Feb. 23, 2015, 3:12 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31327/ --- (Updated Feb. 23, 2015, 3:12 p.m.) Review request for mesos, Ben Mahler, Kapil Arya, and Vinod Kone. Repository: mesos Description --- Added new flags and removed checkpoint slave flag Diffs - docs/configuration.md 47d8ccb6a1f318b153a3509be10b8dac9a24130c Diff: https://reviews.apache.org/r/31327/diff/ Testing --- Thanks, Niklas Nielsen
Re: Review Request 31362: Reenabled hadoop_home and frameworks_home slave flags for mesos-fetcher and added tests
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31362/#review73838 --- Looks great! I have a few minor nits and some questions. Once these are resolved/answered, I'll do another pass and (hopefully) give it a ShipIt. src/launcher/fetcher.cpp https://reviews.apache.org/r/31362/#comment120245 What if the user had set the MESOS_FRAMEWORKS_HOME env variable instead of the --frameworks_home flag? Would that have still worked in 0.21? Will it work now? src/launcher/fetcher.cpp https://reviews.apache.org/r/31362/#comment120246 We don't end log messages in punctuation. (Comments, however, must be punctuated.) src/launcher/fetcher.cpp https://reviews.apache.org/r/31362/#comment120247 if fetch() takes an Option already, can we just pass it `fetcherInfo.get().frameworks_home()` directly? I thought there might be some auto-translation between optional protobufs and Options.. src/tests/fetcher_tests.cpp https://reviews.apache.org/r/31362/#comment120249 Maybe add a comment or a more explicit test for There is no HADOOP_HOME src/tests/fetcher_tests.cpp https://reviews.apache.org/r/31362/#comment120250 This wasn't necessary before, since you weren't using a relative path for URI, right? src/tests/fetcher_tests.cpp https://reviews.apache.org/r/31362/#comment120251 What non-0 return status do you expect? src/tests/fetcher_tests.cpp https://reviews.apache.org/r/31362/#comment120258 Should probably verify that 'proof' does not exist before the test runs. Maybe even delete it if it does somehow exist. src/tests/fetcher_tests.cpp https://reviews.apache.org/r/31362/#comment120256 Might you want to use '' instead of ';', in case the fake fetcher 'cp' fails? src/tests/fetcher_tests.cpp https://reviews.apache.org/r/31362/#comment120257 Very cool. Did you (manually) test that this actually works? - Adam B On Feb. 24, 2015, 9:28 a.m., Bernd Mathiske wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31362/ --- (Updated Feb. 24, 2015, 9:28 a.m.) Review request for mesos, Adam B, Ben Mahler, and Niklas Nielsen. Bugs: MESOS-2390 https://issues.apache.org/jira/browse/MESOS-2390 Repository: mesos Description --- Now the containerizer/fetcher sets the HADOOP_HOME variable for mesos-fetcher again if the slave flag hadoop_home is set. Added a test that checks that HDFS fetching is not broken and also ensures that the flag gets translated to the environment variable and then gets applied in mesos-fetcher. Created a mock hadoop implementation script for this. This script has the exact same side effects as a real haddop client in the scope of our testing. Using this, Mesos testing has no extra external dependencies (on Hadoop). Slave flag frameworks_home does not need to be an evironment variable. It is now part of FetcherInfo only, but it also gets tested now that it works. Diffs - include/mesos/fetcher/fetcher.proto facb87b92bf3194516f636dcc348e136af537721 src/launcher/fetcher.cpp fed0105946da579a38357a30e7ae56e646e05b89 src/slave/containerizer/fetcher.cpp d290f95251def3952c5ee34f600e1d71467f6293 src/tests/fetcher_tests.cpp 2438620e2db94c3c56336fa5d8e69a18fe8f3bac src/tests/mock_hadoop.sh PRE-CREATION Diff: https://reviews.apache.org/r/31362/diff/ Testing --- make check Thanks, Bernd Mathiske
Re: Review Request 31390: Added discovery info documentation.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31390/#review73989 --- Can you link to a rendered markdown version of this? Perhaps just a fork of Mesos under https://github.com/kozyraki That'll make this much easier to read/review. Thanks. - Adam B On Feb. 24, 2015, 9:38 p.m., Christos Kozyrakis wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31390/ --- (Updated Feb. 24, 2015, 9:38 p.m.) Review request for mesos, Connor Doyle and Niklas Nielsen. Bugs: MESOS-2396 https://issues.apache.org/jira/browse/MESOS-2396 Repository: mesos Description --- Added discovery info documentation. Diffs - docs/app-framework-development-guide.md dd7603d099f7582fa8fe93a4c3daa521b23f6223 Diff: https://reviews.apache.org/r/31390/diff/ Testing --- Thanks, Christos Kozyrakis
Re: Review Request 27760: Revised authenticator interface to allow for two fold implementations.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27760/#review73446 --- LGTM, pending the other open issues. src/master/master.cpp https://reviews.apache.org/r/27760/#comment119782 Maybe do a `CHECK_SOME(authenticator)`? src/master/master.cpp https://reviews.apache.org/r/27760/#comment119783 s/hinting/indicating/, since it's stronger than a hint. s/gotten refused/been refused/ - Adam B On Feb. 21, 2015, 2:16 p.m., Till Toenshoff wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27760/ --- (Updated Feb. 21, 2015, 2:16 p.m.) Review request for mesos, Adam B, Kapil Arya, Niklas Nielsen, and Vinod Kone. Bugs: MESOS-2050 https://issues.apache.org/jira/browse/MESOS-2050 Repository: mesos Description --- The initial design and implementation of the authenticator module interface caused issues and was not optimal for heavy lifting setup of external dependencies. By introducing a two fold design, this has been decoupled from the authentication message processing. The new design also gets us back on track to the goal of makeing SASL a soft dependency of mesos. Diffs - include/mesos/authentication/authenticator.hpp f66217a src/Makefile.am d372404 src/authentication/cram_md5/authenticator.hpp 7578ea1 src/authentication/cram_md5/authenticator.cpp PRE-CREATION src/authentication/cram_md5/auxprop.hpp d036b11 src/authentication/cram_md5/auxprop.cpp 5ff9755 src/master/master.hpp a466f92 src/master/master.cpp f10a3cf src/tests/cram_md5_authentication_tests.cpp dd102dc Diff: https://reviews.apache.org/r/27760/diff/ Testing --- make check Thanks, Till Toenshoff
Re: Review Request 27760: Revised authenticator interface to allow for two fold implementations.
On Feb. 20, 2015, 2:19 p.m., Vinod Kone wrote: src/master/master.cpp, lines 466-467 https://reviews.apache.org/r/27760/diff/17/?file=869282#file869282line466 why this if condition for logging? Till Toenshoff wrote: Not sure I understand. Did you see the comment directly above those lines? ``` // A failure to initialize the authenticator does lead to // unusable authentication but still allows non authenticating // frameworks and slaves to connect. ``` I think (a few revisions back), the thought was that the default parameters for a master are authenticate_frameworks/slaves=false credentials=none authenticator=default(cram), at which point it seems unnecessary to warn somebody that we didn't load the authenticator. Upon returning to this decision, I see no reason not to log this message anytime authentication is disabled, even if it's the default setting. Maybe it would seem less harsh as an INFO in the default case, but a single WARN message on master startup could be a gentle nudge to try out authentication. On Feb. 20, 2015, 2:19 p.m., Vinod Kone wrote: src/master/master.cpp, lines 3870-3873 https://reviews.apache.org/r/27760/diff/17/?file=869282#file869282line3870 Did you mix up inner and outer here? Till Toenshoff wrote: I rephrased it, maybe this is better? ``` // The inner future may get satisfied by an empty option, hinting // that the authentication has gotten refused. // The outer future will be ready only if we have entirely // authenticated successfully. ``` FTFY: The inner future will be satisfied when the authentication process has completed and returned a result; however, it could return an Option None, indicating that authentication was denied. The outer future will be ready only if/when authentication has succeeded. Why did we go for an Optionstring rather than a Trystring? Seems like Try would better reflect the difference between success/failure than some/none does. On Feb. 20, 2015, 2:19 p.m., Vinod Kone wrote: src/master/master.cpp, lines 3952-3954 https://reviews.apache.org/r/27760/diff/17/?file=869282#file869282line3952 Can you rephrase these log messages to something meaninguful when one looks at master log? Till Toenshoff wrote: How about this? ``` if (authenticate.isDiscarded()) { LOG(WARNING) Authentication for pid has pending cancel request; } else if (authenticate.discard()) { LOG(WARNING) Requested to cancel authentication for pid; } ``` How about: Tried to cancel authentication session for [pid], but it was already cancelled and Cancelling authentication session for PID? - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27760/#review73310 --- On Feb. 19, 2015, 6:02 a.m., Till Toenshoff wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27760/ --- (Updated Feb. 19, 2015, 6:02 a.m.) Review request for mesos, Adam B, Kapil Arya, Niklas Nielsen, and Vinod Kone. Bugs: MESOS-2050 https://issues.apache.org/jira/browse/MESOS-2050 Repository: mesos Description --- The initial design and implementation of the authenticator module interface caused issues and was not optimal for heavy lifting setup of external dependencies. By introducing a two fold design, this has been decoupled from the authentication message processing. The new design also gets us back on track to the goal of makeing SASL a soft dependency of mesos. Diffs - include/mesos/authentication/authenticator.hpp f66217a src/Makefile.am d372404 src/authentication/cram_md5/authenticator.hpp 7578ea1 src/authentication/cram_md5/authenticator.cpp PRE-CREATION src/authentication/cram_md5/auxprop.hpp d036b11 src/authentication/cram_md5/auxprop.cpp 5ff9755 src/master/master.hpp 6a39df0 src/master/master.cpp f10a3cf src/tests/cram_md5_authentication_tests.cpp dd102dc Diff: https://reviews.apache.org/r/27760/diff/ Testing --- make check Thanks, Till Toenshoff
Re: Review Request 29507: Added Configurable Slave Ping Timeouts
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29507/ --- (Updated Feb. 19, 2015, 12:10 a.m.) Review request for mesos, Ben Mahler and Niklas Nielsen. Changes --- Merged in subsequent patch to make slave use master's ping timeout from SlaveRe[re]gisteredMessage, plus unit tests. Bugs: MESOS-2110 https://issues.apache.org/jira/browse/MESOS-2110 Repository: mesos Description (updated) --- Added new --slave_ping_timeout and --max_slave_ping_timeouts flags to mesos-master to supplement the DEFAULT_SLAVE_PING_TIMEOUT (15secs) and DEFAULT_MAX_SLAVE_PING_TIMEOUTS (5). These can be extended if slaves are expected/allowed to be down for longer than a minute or two. Slave will receive master's ping timeout in SlaveRe[re]gisteredMessage. Beware that this affects recovery from network timeouts as well as actual slave node/process failover. Diffs (updated) - src/master/constants.hpp ad3fe81 src/master/constants.cpp d3d0f71 src/master/flags.hpp 51a6059 src/master/master.cpp f10a3cf src/messages/messages.proto 58484ae src/slave/constants.hpp 12d6e92 src/slave/constants.cpp 7868bef src/slave/slave.hpp 91dae10 src/slave/slave.cpp aec9525 src/tests/fault_tolerance_tests.cpp efa5c57 src/tests/partition_tests.cpp eb16a58 src/tests/slave_recovery_tests.cpp 8210c52 src/tests/slave_tests.cpp 153d9d6 Diff: https://reviews.apache.org/r/29507/diff/ Testing (updated) --- Manually tested slave failover/shutdown with master using different --slave_ping_timeout and --max_slave_ping_timeouts. Ran unit tests with shorter non-default values for ping timeouts. `make check` with new unit tests: ShortPingTimeoutUnreachableMaster and ShortPingTimeoutUnreachableSlave Thanks, Adam B
Re: Review Request 29507: Added Configurable Slave Ping Timeouts
On Feb. 18, 2015, 11:38 a.m., Niklas Nielsen wrote: Let's get tests wired up before committing this :) Adam B wrote: Sure thing. Adding tests in my subsequent patch where we will pass the master's timeout values on to the slave. Will post that very soon. Ben Mahler wrote: Can you do it in one patch? This patch in isolation looks a bit dangerous per our conversation above. Also, please carefully consider whether your approach will be safe to do in a single version. i.e. What happens when there are old slaves running against a new master? And vice versa. Easy enough. Added the second patch to this one. Most of the new changes are in messages.proto, slave.hpp/cpp, and partition_tests.cpp, with a few lines in master.cpp to set the new message fields. Certainly safe to upgrade if the new flags stay at their default value. Also, with new slaves talking to an old master, the master will still use the defaults, hence so will the slaves. But old slaves running against a new master with a longer timeout will try to reregister unnecessarily early, so you may want to guarantee that all/most of the slaves have been upgraded before setting these flags, otherwise a large cluster suddenly waking up from a network split would see a lot of unnecessary reregistration attempts. The old behavior in this scenario was that the slaves would reregister after the same default timeout after the network split, and the master would consider them shutdown after the same timeout. If that last scenario is a problem, maybe the better approach is for each slave to specify its own ping timeout, and then the master can use these values with each slave's SlaveObserver. Then we can just require the master(s) to be upgraded first, as is typical. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29507/#review72992 --- On Feb. 19, 2015, 12:10 a.m., Adam B wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29507/ --- (Updated Feb. 19, 2015, 12:10 a.m.) Review request for mesos, Ben Mahler and Niklas Nielsen. Bugs: MESOS-2110 https://issues.apache.org/jira/browse/MESOS-2110 Repository: mesos Description --- Added new --slave_ping_timeout and --max_slave_ping_timeouts flags to mesos-master to supplement the DEFAULT_SLAVE_PING_TIMEOUT (15secs) and DEFAULT_MAX_SLAVE_PING_TIMEOUTS (5). These can be extended if slaves are expected/allowed to be down for longer than a minute or two. Slave will receive master's ping timeout in SlaveRe[re]gisteredMessage. Beware that this affects recovery from network timeouts as well as actual slave node/process failover. Diffs - src/master/constants.hpp ad3fe81 src/master/constants.cpp d3d0f71 src/master/flags.hpp 51a6059 src/master/master.cpp f10a3cf src/messages/messages.proto 58484ae src/slave/constants.hpp 12d6e92 src/slave/constants.cpp 7868bef src/slave/slave.hpp 91dae10 src/slave/slave.cpp aec9525 src/tests/fault_tolerance_tests.cpp efa5c57 src/tests/partition_tests.cpp eb16a58 src/tests/slave_recovery_tests.cpp 8210c52 src/tests/slave_tests.cpp 153d9d6 Diff: https://reviews.apache.org/r/29507/diff/ Testing --- Manually tested slave failover/shutdown with master using different --slave_ping_timeout and --max_slave_ping_timeouts. Ran unit tests with shorter non-default values for ping timeouts. `make check` with new unit tests: ShortPingTimeoutUnreachableMaster and ShortPingTimeoutUnreachableSlave Thanks, Adam B
Re: Review Request 30995: Added Access class for endpoint's control in the Files class.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30995/#review72927 --- First pass, found some minor typos/nits. Still considering whether we might want a different API than 4 bools or a `hashsetstring`, but if we do keep the hashset, the strings should probably more closely match the actual endpoints they represent (e.g. whatever.json) src/files/files.hpp https://reviews.apache.org/r/30995/#comment119045 s/permission/access/? src/files/files.cpp https://reviews.apache.org/r/30995/#comment119046 s/itseld/itself/ src/files/files.cpp https://reviews.apache.org/r/30995/#comment119047 s/enpoint/endpoint/ src/files/files.cpp https://reviews.apache.org/r/30995/#comment119048 s/disallowed/forbidden/? src/files/files.cpp https://reviews.apache.org/r/30995/#comment119050 Remove unnecessary blank line src/files/files.cpp https://reviews.apache.org/r/30995/#comment119056 Why match browse instead of the entire endpoint string, /browse.json? Couldn't there potentially be another separate endpoint that browse or /browse route to? src/tests/files_tests.cpp https://reviews.apache.org/r/30995/#comment119053 s/permission/access/? hashsetstring - Adam B On Feb. 13, 2015, 7:59 a.m., Alexander Rojas wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30995/ --- (Updated Feb. 13, 2015, 7:59 a.m.) Review request for mesos, Adam B, Till Toenshoff, and Vinod Kone. Bugs: MESOS-2333 https://issues.apache.org/jira/browse/MESOS-2333 Repository: mesos Description --- Adds an Access class which is used to turn on and off the endpoints provided by the `mesos::Files` objects. Disabled endpoints return a 403 Forbidden http response. Diffs - src/files/files.hpp c8322d1381b6ba0e52bd29a4291fd3a786ce5fe0 src/files/files.cpp 58a74b10af65ee30b7035ff00725d0f8e98529ea src/tests/files_tests.cpp 650f1b947d87b7e17651f7378369d3ade826e412 Diff: https://reviews.apache.org/r/30995/diff/ Testing --- make check Thanks, Alexander Rojas
Re: Review Request 27760: Revised authenticator interface to allow for two fold implementations.
On Feb. 16, 2015, 11:37 p.m., Adam B wrote: src/authentication/cram_md5/authenticator.hpp, line 533 https://reviews.apache.org/r/27760/diff/14/?file=863170#file863170line533 Should this still `process::terminate(process, false)` if the short term fix is now in `~CRAMMD5AuthenticatorSession`? The 'false' injects the 'terminate' at the end of the process' queue, so other in-flight events get handled first (semi-graceful shutdown). Till Toenshoff wrote: Correct, I think it should be a regular terminate at this point now (getting pushed into the front of the queue). From the description of MESOS-1866, I understand the race to apply to destruction of the session, not the entire master/authenticator. The double-future approach might even fix this somehow. I'd have to dig deeper. Unfortunately, MESOS-1866 Race 1 never had a unit test written for it, so there's no way to test that we aren't regressing. Maybe @vinodkone can comment on the previous race and help us determine if we're maintaining/improving the proper behavior. On Feb. 16, 2015, 11:37 p.m., Adam B wrote: src/authentication/cram_md5/authenticator.hpp, line 545 https://reviews.apache.org/r/27760/diff/14/?file=863170#file863170line545 What's the reasoning for this being a `static*`? Why wouldn't a plain old `OptionError` work? Till Toenshoff wrote: That way, if an error occured in the once-covered case, then any additional calls to initialize (e.g. by another instance) would still have access to that former, once-covered error and be able to return it. Okay, makes sense. Worth a comment? - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27760/#review72680 --- On Feb. 17, 2015, 7:57 p.m., Till Toenshoff wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27760/ --- (Updated Feb. 17, 2015, 7:57 p.m.) Review request for mesos, Adam B, Kapil Arya, Niklas Nielsen, and Vinod Kone. Bugs: MESOS-2050 https://issues.apache.org/jira/browse/MESOS-2050 Repository: mesos Description --- The initial design and implementation of the authenticator module interface caused issues and was not optimal for heavy lifting setup of external dependencies. By introducing a two fold design, this has been decoupled from the authentication message processing. The new design also gets us back on track to the goal of makeing SASL a soft dependency of mesos. Diffs - include/mesos/authentication/authenticator.hpp f66217a src/Makefile.am d372404 src/authentication/cram_md5/authenticator.hpp 7578ea1 src/authentication/cram_md5/authenticator.cpp PRE-CREATION src/authentication/cram_md5/auxprop.hpp d036b11 src/authentication/cram_md5/auxprop.cpp 5ff9755 src/master/master.hpp 6a39df0 src/master/master.cpp f10a3cf src/tests/cram_md5_authentication_tests.cpp dd102dc Diff: https://reviews.apache.org/r/27760/diff/ Testing --- make check Thanks, Till Toenshoff
Re: Review Request 27760: Revised authenticator interface to allow for two fold implementations.
On Jan. 28, 2015, 12:33 p.m., Vinod Kone wrote: src/master/master.cpp, lines 4193-4203 https://reviews.apache.org/r/27760/diff/12/?file=834181#file834181line4193 IIRC, sending an AuthenticationErrorMessage here causes the driver to simply retry the authentication and thus fall into a retry loop? Adam B wrote: Truth. There is a TODO about at least adding a backoff, but maybe we should also attempt registering without authentication? Maybe authentication shouldn't even cause retries under certain scenarios where it will never eventually succeed (bad credentials, no authenticator). Till Toenshoff wrote: Can we make this a TODO at this spot as well and create a JIRA? I'ld also be happy to work on a follow up patch, covering more finegrained authentication result-triggers. We should create a JIRA for the authentication backoff TODO, and perhaps another JIRA for retrying without authentication. I would like to point out that the behavior after this patch matches the AuthenticationErrorMessage behavior all the way back to 0.19 or earlier, so we can certainly leave the TODOs alone for this release and just create JIRAs for proper tracking in the future. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27760/#review70051 --- On Feb. 17, 2015, 7:57 p.m., Till Toenshoff wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27760/ --- (Updated Feb. 17, 2015, 7:57 p.m.) Review request for mesos, Adam B, Kapil Arya, Niklas Nielsen, and Vinod Kone. Bugs: MESOS-2050 https://issues.apache.org/jira/browse/MESOS-2050 Repository: mesos Description --- The initial design and implementation of the authenticator module interface caused issues and was not optimal for heavy lifting setup of external dependencies. By introducing a two fold design, this has been decoupled from the authentication message processing. The new design also gets us back on track to the goal of makeing SASL a soft dependency of mesos. Diffs - include/mesos/authentication/authenticator.hpp f66217a src/Makefile.am d372404 src/authentication/cram_md5/authenticator.hpp 7578ea1 src/authentication/cram_md5/authenticator.cpp PRE-CREATION src/authentication/cram_md5/auxprop.hpp d036b11 src/authentication/cram_md5/auxprop.cpp 5ff9755 src/master/master.hpp 6a39df0 src/master/master.cpp f10a3cf src/tests/cram_md5_authentication_tests.cpp dd102dc Diff: https://reviews.apache.org/r/27760/diff/ Testing --- make check Thanks, Till Toenshoff
Re: Review Request 27760: Revised authenticator interface to allow for two fold implementations.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27760/#review72942 --- src/master/master.cpp https://reviews.apache.org/r/27760/#comment119071 Remove trailing ',' src/master/master.cpp https://reviews.apache.org/r/27760/#comment119072 Merge these lines src/authentication/cram_md5/authenticator.cpp https://reviews.apache.org/r/27760/#comment119069 I'm still confused about what this log line is supposed to mean.. that we're inside [CRAMMD5Authenticator]Process and the authentication session is about to start? If that's the case, perhaps CRAMMD5AuthenticatorProcess starting authentication session? (and then CRAM...Process cleaning up after authentication session) Otherwise, I'm very unclear what the Process refers to, and what authentication modifies. Or is Process intended as a verb? Much too overloaded here. src/authentication/cram_md5/authenticator.cpp https://reviews.apache.org/r/27760/#comment119073 Authentication session already active...? - Adam B On Feb. 17, 2015, 7:57 p.m., Till Toenshoff wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27760/ --- (Updated Feb. 17, 2015, 7:57 p.m.) Review request for mesos, Adam B, Kapil Arya, Niklas Nielsen, and Vinod Kone. Bugs: MESOS-2050 https://issues.apache.org/jira/browse/MESOS-2050 Repository: mesos Description --- The initial design and implementation of the authenticator module interface caused issues and was not optimal for heavy lifting setup of external dependencies. By introducing a two fold design, this has been decoupled from the authentication message processing. The new design also gets us back on track to the goal of makeing SASL a soft dependency of mesos. Diffs - include/mesos/authentication/authenticator.hpp f66217a src/Makefile.am d372404 src/authentication/cram_md5/authenticator.hpp 7578ea1 src/authentication/cram_md5/authenticator.cpp PRE-CREATION src/authentication/cram_md5/auxprop.hpp d036b11 src/authentication/cram_md5/auxprop.cpp 5ff9755 src/master/master.hpp 6a39df0 src/master/master.cpp f10a3cf src/tests/cram_md5_authentication_tests.cpp dd102dc Diff: https://reviews.apache.org/r/27760/diff/ Testing --- make check Thanks, Till Toenshoff
Re: Review Request 30736: Add GPGPU resouce support
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30736/#review72973 --- Sorry I'm just getting around to looking at this. Been busy with the Project Myriad incubator proposal and 0.22 release. I tend to agree with Tim Chen who suggested first using a custom --resources type. Many customers use Mesos this way to provide scheduling/allocation of otherwise unsupported resource types. You could also use the new isolator module interface to enforce isolation of your new resource type. It's definitely an interesting use case for Mesos, and we'd like to support it, but we may not want to add first-class support until we've played around with experimental support in a custom resource and/or plugin module (which separates opencl and boostcompute dependencies). A module would also be cleaner than #ifdef'ing out big chunks of code. - Adam B On Feb. 6, 2015, 9:33 a.m., chester kuo wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30736/ --- (Updated Feb. 6, 2015, 9:33 a.m.) Review request for mesos, Adam B and Timothy Chen. Bugs: MESOS-2262 https://issues.apache.org/jira/browse/MESOS-2262 Repository: mesos Description --- Extending Mesos to support Heterogeneous resource such as GPGPU/FPGA..etc as computing resources in the data-center, OpenCL will be first target to add into Mesos. This is first draft, some slave resouce discover not ready but i plan to work on a framework first so we can have people to test it early. Diffs - configure.ac acc685caa47717f0fb61a7c18572d7b763ec707b include/mesos/resources.hpp 3b57568c10233a0c692787de6464f21af5eaadf4 src/Makefile.am 93537d17d3c7604a8532ee1453e405630c481ddc src/common/resources.cpp 98371f6873482d0cdbefeb279b58ae6cc680a88f src/examples/test_opencl_executor.cpp PRE-CREATION src/examples/test_opencl_framework.cpp PRE-CREATION src/slave/containerizer/containerizer.cpp cf2ece80252350c25db38116e29cf28a3164662d src/tests/master_allocator_tests.cpp 1eebefd2e423e4bb89d76ed7b7d8acc9d1bb7760 src/tests/master_tests.cpp 62ba35b9c3f999c59a95bffb01b8b82cc543a34f Diff: https://reviews.apache.org/r/30736/diff/ Testing --- make check I also create a simple frame test (test-opencl-framework so it can have a quick smoke test) Thanks, chester kuo
Re: Review Request 29507: Added Configurable Slave Ping Timeouts
On Feb. 18, 2015, 11:38 a.m., Niklas Nielsen wrote: Let's get tests wired up before committing this :) Sure thing. Adding tests in my subsequent patch where we will pass the master's timeout values on to the slave. Will post that very soon. - Adam --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29507/#review72992 --- On Jan. 8, 2015, 12:44 a.m., Adam B wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29507/ --- (Updated Jan. 8, 2015, 12:44 a.m.) Review request for mesos and Niklas Nielsen. Bugs: MESOS-2110 https://issues.apache.org/jira/browse/MESOS-2110 Repository: mesos Description --- Added new --slave_ping_timeout and --max_slave_ping_timeouts flags to mesos-master to replace the existing (still default) SLAVE_PING_TIMEOUT (15secs) and MAX_SLAVE_PING_TIMEOUTS (5). These can be extended if slaves are expected/allowed to be down for longer than a minute or two. Beware that this affects recovery from network timeouts as well as actual slave node/process failover. Diffs - src/master/constants.hpp c386eab src/master/constants.cpp 9ee17e9 src/master/flags.hpp f5c8d2a src/master/master.cpp d6651e2 src/slave/constants.hpp fd1c1ab src/slave/constants.cpp 2a99b11 src/slave/slave.cpp 50b5781 src/tests/fault_tolerance_tests.cpp 5763486 src/tests/partition_tests.cpp fea7801 src/tests/slave_recovery_tests.cpp cd4a398 src/tests/slave_tests.cpp c50cbc7 Diff: https://reviews.apache.org/r/29507/diff/ Testing --- Manually tested slave failover/shutdown with master using different --slave_ping_timeout and --max_slave_ping_timeouts. Updated unit tests to use shorter non-default values for ping timeouts. Thanks, Adam B
Re: Review Request 29507: Added Configurable Slave Ping Timeouts
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29507/ --- (Updated Feb. 18, 2015, 4:33 p.m.) Review request for mesos and Niklas Nielsen. Changes --- Rebased, in preparation for part 2 (slave-side timeouts). Bugs: MESOS-2110 https://issues.apache.org/jira/browse/MESOS-2110 Repository: mesos Description (updated) --- Added new --slave_ping_timeout and --max_slave_ping_timeouts flags to mesos-master to supplement the DEFAULT_SLAVE_PING_TIMEOUT (15secs) and DEFAULT_MAX_SLAVE_PING_TIMEOUTS (5). These can be extended if slaves are expected/allowed to be down for longer than a minute or two. Beware that this affects recovery from network timeouts as well as actual slave node/process failover. Diffs (updated) - src/master/constants.hpp c386eab src/master/constants.cpp 9ee17e9 src/master/flags.hpp 6c18a1a src/master/master.cpp f4b6463 src/slave/constants.hpp 761cfaf src/slave/constants.cpp 83d9fc1 src/slave/slave.cpp a8b2621 src/tests/fault_tolerance_tests.cpp f927d4a src/tests/partition_tests.cpp fea7801 src/tests/slave_recovery_tests.cpp 7e2e63d src/tests/slave_tests.cpp e7e2af6 Diff: https://reviews.apache.org/r/29507/diff/ Testing (updated) --- Manually tested slave failover/shutdown with master using different --slave_ping_timeout and --max_slave_ping_timeouts. Ran unit tests with shorter non-default values for ping timeouts. Thanks, Adam B
Re: Review Request 30996: Added DisabledEndpoints message.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30996/#review72651 --- Looking good, but I don't know why this would go into messages.proto instead of mesos.proto. src/common/parse.hpp https://reviews.apache.org/r/30996/#comment118697 Please add another blank line here to space out the top-level functions. src/messages/messages.hpp https://reviews.apache.org/r/30996/#comment118701 If DisabledEndpoints/Endpoint go into mesos.proto, then you can also move this to type_utils.hpp src/messages/messages.hpp https://reviews.apache.org/r/30996/#comment118695 Please add another blank line here, since we're at the top-level scope. src/messages/messages.proto https://reviews.apache.org/r/30996/#comment118698 Perhaps this should go into mesos.proto, along with ACL/ACLs, Credential/Credentials, and other datatype protobufs. Typically messages.proto is used for internal messages (which end in Message). src/messages/messages.proto https://reviews.apache.org/r/30996/#comment118699 Perhaps this should be `repeated Endpoint endpoints = 1;` Then you could have a separate `message Endpoint { required string value = 1; }` Maybe that's overkill, but I could imagine us wanting Endpoint for other things in the future, and separating them out like this would make it easier to add more per-endpoint fields. What do you think? - Adam B On Feb. 13, 2015, 8:02 a.m., Alexander Rojas wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30996/ --- (Updated Feb. 13, 2015, 8:02 a.m.) Review request for mesos, Adam B, Till Toenshoff, and Vinod Kone. Bugs: MESOS-2333 https://issues.apache.org/jira/browse/MESOS-2333 Repository: mesos Description --- Adds a `DisabledEndpoints` protobuf message, which will allow the user give a list of endpoints which should be disabled. Diffs - src/common/parse.hpp 547b32041f39f0ff0c38179b66a32b2239134abc src/messages/messages.hpp 25769b797ec9bd1a1c348c467616aef8407c45d6 src/messages/messages.proto 58484ae45071a80afd2b11803dd66a88f88ad9ed Diff: https://reviews.apache.org/r/30996/diff/ Testing --- make check Thanks, Alexander Rojas