----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16147/#review32969 -----------------------------------------------------------
Looks great! My main concern is that there seems to be a bug in recover() when an executor dies before checkpointing forked pid. src/launcher/fetcher.cpp <https://reviews.apache.org/r/16147/#comment62057> could we just do 'xf' and let tar figure out the compression? src/launcher/fetcher.cpp <https://reviews.apache.org/r/16147/#comment62055> not properly aligned. src/launcher/fetcher.cpp <https://reviews.apache.org/r/16147/#comment62056> not properly aligned. src/launcher/fetcher.cpp <https://reviews.apache.org/r/16147/#comment62059> CHECK(pos != std::string::npos) << "blah...."; src/launcher/fetcher.cpp <https://reviews.apache.org/r/16147/#comment62060> You can kill "uri.has_executable()". src/launcher/fetcher.cpp <https://reviews.apache.org/r/16147/#comment62061> s/chmod"/chmod: "<< fetched.get()/ src/launcher/fetcher.cpp <https://reviews.apache.org/r/16147/#comment62062> Why the extraction in an else loop? Was this the old behavior? mesos.proto doesn't say that it is only extracted if it's executable. src/launcher/fetcher.cpp <https://reviews.apache.org/r/16147/#comment62064> include the filename here. src/launcher/fetcher.cpp <https://reviews.apache.org/r/16147/#comment62063> s/a/an/ ? src/launcher/fetcher.cpp <https://reviews.apache.org/r/16147/#comment62065> include the filename here. src/local/local.cpp <https://reviews.apache.org/r/16147/#comment62066> s/isolator/containerizer/ ? Do you know what is this TODO about? src/slave/containerizer/containerizer.hpp <https://reviews.apache.org/r/16147/#comment62089> Isn't this containerizer interface? s/enabled/enable/ src/slave/containerizer/containerizer.hpp <https://reviews.apache.org/r/16147/#comment62091> s/Termination/'Termination'/ s/terminates/terminates,/ ? src/slave/containerizer/containerizer.hpp <https://reviews.apache.org/r/16147/#comment62093> new line. src/slave/containerizer/containerizer.hpp <https://reviews.apache.org/r/16147/#comment62094> new line. src/slave/containerizer/containerizer.hpp <https://reviews.apache.org/r/16147/#comment62095> new line. src/slave/containerizer/containerizer.cpp <https://reviews.apache.org/r/16147/#comment62096> why not a hashmap? src/slave/containerizer/containerizer.cpp <https://reviews.apache.org/r/16147/#comment62097> Should there be a check here for user specifying cgroups isolation on non-linux boxes? src/slave/containerizer/containerizer.cpp <https://reviews.apache.org/r/16147/#comment62099> No need for "+"s to concatenate!? src/slave/containerizer/mesos_containerizer.hpp <https://reviews.apache.org/r/16147/#comment62100> new line. src/slave/containerizer/mesos_containerizer.hpp <https://reviews.apache.org/r/16147/#comment62103> s/destroy()/'destroy()'/ src/slave/containerizer/mesos_containerizer.hpp <https://reviews.apache.org/r/16147/#comment62106> s/_destroy()/'_destroy()'/ src/slave/containerizer/mesos_containerizer.cpp <https://reviews.apache.org/r/16147/#comment62109> process = new MesosContainerizerProcess( flags, local, launcher, isolators); src/slave/containerizer/mesos_containerizer.cpp <https://reviews.apache.org/r/16147/#comment62114> I haven't yet seen where the forked pid checkpointing is done, so I might be missing something here. But according to the old semantics this should not be a return! It is entirely possible that an executor died before it got a chance to checkpoint its forked pid. src/slave/containerizer/mesos_containerizer.cpp <https://reviews.apache.org/r/16147/#comment62116> Depending on how you fix the above, this should/could not be a CHECK either. src/slave/containerizer/mesos_containerizer.cpp <https://reviews.apache.org/r/16147/#comment62117> Do you need the if loop? How about doing straight up trim? src/slave/flags.hpp <https://reviews.apache.org/r/16147/#comment62067> s/isolation/Flags::isolation/ ? src/slave/monitor.hpp <https://reviews.apache.org/r/16147/#comment62068> s/executor/container/ src/slave/monitor.hpp <https://reviews.apache.org/r/16147/#comment62069> s/executor/container/ src/slave/slave.hpp <https://reviews.apache.org/r/16147/#comment62073> not yours but could you s/recoveR/recover/ src/slave/slave.hpp <https://reviews.apache.org/r/16147/#comment62074> Is this called after _recover but before __recover? Not looking ahead at the implementation, this seems to warrant a comment either way. src/slave/slave.cpp <https://reviews.apache.org/r/16147/#comment62075> s/."/: " << resources.error()/ src/slave/slave.cpp <https://reviews.apache.org/r/16147/#comment62077> how about the following to make it consistent with how you did it elsewhere. statusUpdateManager->update( update, info.id(), executor->id, executor->containerId) .onAny(...) src/slave/slave.cpp <https://reviews.apache.org/r/16147/#comment62079> With my recent change to include the first tasks resources when launching an executor do we still need this? src/slave/slave.cpp <https://reviews.apache.org/r/16147/#comment62080> s/container"/container " << containerId/ src/slave/slave.cpp <https://reviews.apache.org/r/16147/#comment62081> Why the quotes here but not above? src/slave/slave.cpp <https://reviews.apache.org/r/16147/#comment62092> Wouldn't the containerizer destroy it automatically in this case? src/slave/slave.cpp <https://reviews.apache.org/r/16147/#comment62082> Could you wrap termination.isReady && termination.get().killed insidd "()" src/slave/slave.cpp <https://reviews.apache.org/r/16147/#comment62083> Are you missing termination.isReady() check here? src/slave/slave.cpp <https://reviews.apache.org/r/16147/#comment62085> This cannot happen anymore because we call wait below correct? src/slave/slave.cpp <https://reviews.apache.org/r/16147/#comment62086> s/launchExecutor()/'launchExecutor()'/ - Vinod Kone On Jan. 27, 2014, 3 a.m., Ian Downes wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/16147/ > ----------------------------------------------------------- > > (Updated Jan. 27, 2014, 3 a.m.) > > > Review request for mesos, Benjamin Hindman, Ben Mahler, Chi Zhang, Niklas > Nielsen, samya, and Jason Dusek. > > > Repository: mesos-git > > > Description > ------- > > The proposed Containerizer interface is to replace the existing Isolator. > > One ContainerizerProcess has been written: > MesosContainerizerProcess - implements containerizeration internally using a > Launcher and one or more Isolators (following review) > > The intent is to also support a generic ExternalContainerizerProcess that can > delegate containerizeration by making external calls. Other Containerizers > could interface with specific external containerization techniques such as > Docker or LXC. > > > Diffs > ----- > > include/mesos/mesos.proto 655f86757487ddbe551fdcf53eb793e773ecdd34 > src/Makefile.am cf0c8c66e8dd21c2f5a2da22e5d5adb056353454 > src/common/type_utils.hpp 3b05751a9a27055dd19e4abb79197f53e5554fe1 > src/launcher/fetcher.cpp PRE-CREATION > src/launcher/launcher.hpp 104fe812ddd2b75b14ec0b40d192f58cdba3603a > src/launcher/launcher.cpp d5ab66704429a95eeb8eda5188e33d8e691221af > src/launcher/main.cpp de64609905ee63096c0173fe7e64a1eafea5d6bf > src/local/local.cpp 83a7f913afb1441e9137c7fcec8fd81815714905 > src/slave/cgroups_isolator.hpp e86062e9abaaa263c32c55e9dbfefd700f605886 > src/slave/cgroups_isolator.cpp 80155a3f0cd684a6aad9a37f61c5204d86032735 > src/slave/containerizer/containerizer.hpp PRE-CREATION > src/slave/containerizer/containerizer.cpp PRE-CREATION > src/slave/containerizer/mesos_containerizer.hpp PRE-CREATION > src/slave/containerizer/mesos_containerizer.cpp PRE-CREATION > src/slave/flags.hpp 827b2d0d6dc8fa279f3187a09e5dc6d4799d17cd > src/slave/http.cpp c8357e214d2adf2cd712072f58d07b07badb79dc > src/slave/isolator.hpp 9634535d8c746597b4bb6e278587a1b9ca8f1608 > src/slave/isolator.cpp c9643cf9c5657bc142482a71fb161233bffb3b9f > src/slave/main.cpp e0cae7b205c2599e05c4db990cc9c8e9e3673c37 > src/slave/monitor.hpp 52568ad8ec566f7cf36c249c76d798d44eacb578 > src/slave/monitor.cpp a931c4f35a8793c66ee03de82f0e0a21b92f8ffa > src/slave/paths.hpp b1a56a346ca84e7e71c31386602264c5097ef1cf > src/slave/process_isolator.hpp 4ae093fe65775a2b9bec42071961dd58aa0c3d8b > src/slave/process_isolator.cpp 0bc698f04f7c8eaad166dc9d646e13310129dd01 > src/slave/slave.hpp 38ba7b6d6c35f0668d2f021a1285363cbf4367f4 > src/slave/slave.cpp 8b83dacad802a5434bf76ca075b2b8113ed14a84 > src/slave/state.hpp 78b20ffae1f2f629d851784302648e3bf5349321 > src/slave/state.cpp 93cde0c2efbbe5387448631a02e64b20a3d021c1 > src/slave/status_update_manager.hpp > 06ea4659cdd24cb1b82f389f404566ba14a663fb > src/slave/status_update_manager.cpp > 03f5eafefd6ed748bfd4511d654c23c7b460db66 > > Diff: https://reviews.apache.org/r/16147/diff/ > > > Testing > ------- > > > Thanks, > > Ian Downes > >