----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22579/#review45845 -----------------------------------------------------------
Awesome to see this Tim! I gave a really thorough review. Please let me know what doesn't make sense. include/mesos/mesos.proto <https://reviews.apache.org/r/22579/#comment80839> We try and wrap our comments at 70 characters. See http://mesos.apache.org/documentation/latest/mesos-c++-style-guide for more details. (I'm not sure if the style checker catches this.) src/common/type_utils.hpp <https://reviews.apache.org/r/22579/#comment80841> We include a space between 'if' and '('. I noticed this other places in the review too, please sweep through and clean them all up! src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment80842> This is an internal header which we use quotes for instead of <> (which are used for system/3rdparty headers), and we put them below all the system/3rdparty includes. src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment80843> Just like our headers, we separate these with a newline. src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment81079> Was there any reason you didn't take the 'executor' and 'taskId' arguments here too? It seems a bit arbitrary to take them below. src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment80844> We wrap these down below, ala: Future<Nothing> check const UPID& executor, const TaskID& taskID) { You'll still see some old code that doesn't do this, but use http://mesos.apache.org/documentation/latest/mesos-c++-style-guide for the preferred indentation style. Thanks! src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment80899> Any reason not to use LOG here? This is an internal tool and in the past it's been really valuable to use LOG because it includes timestamps (amongst other things)! This will also let you kill '[HealthCheck]' (and don't worry about ending all your LOG lines with periods, see our style guide for more details). Moreover, it will let you use stdout for providing the output of the program, in case we ever wanted to output the health check results there. ;-) src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment80898> Just an FYI, we'll often wrap things like this using 'Seconds' since it will print it out in an easy to read manner. For example, rather than printing "259200 seconds" it will print "3 days" for you automagically. All you need to do is: << Seconds(check_.delay_seconds()) << ", grace period " << Seconds(check_.grace_period_seconds()); I think you might do this other places too. src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment80845> You shouldn't need to cast to 'Duration'. I think you might do this a few other places too. src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment81080> Why pass a default argument? Seems like you always call this with an argument and it would be a bit unfortunate if we ever failed the promise with an empty string. src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment81081> When I'm actually trying to copy a value I prefer to use CopyFrom rather than MergeFrom. I've seen this throughout this review. src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment81082> I don't see any value in adding a 'message' parameter if we're not using it. This would be a simple addition in the future for someone else to do if they needed/wanted that functionality. src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment81083> Any reason not to just keep this on stderr? Honestly, it seems like we'd like to pipe the command's stdout/stderr to our stderr so that it's part of the stderr file that mesos-executor is writing to. Less moving parts, less things to discover, and later if people don't want this output they can tell the mesos-executor to put the data someplace else. src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment81089> See the style guide for how we do wrapping and indentation here. In this case the style guide prefers: Try<Subprocess> external = process::subprocess(command.value(), environment); src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment81086> Not a huge deal but just wanted to mention that we don't use periods at the end of these error messages because they often get chained. You've got a mix of some with and some without periods, best to just kill them all. src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment81084> Just FYI, we are defaulting to C++11 but we let people turn it off and '>>' won't compile in C++03 (needs to be '> >'). src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment81125> We don't want to wait more than check_.timeout_seconds() here. I don't think it's a big deal if you don't make this asynchronous for now but we shouldn't wait more than that much time. src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment81090> We use Google style here, and put the + on the previous line. And just for future reference the same is true for conditionals, e.g.,: while (foo && bar && baz) ... I'm just going to comment on this one but there are a handful more so please do a quick run through in your code please. src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment81095> It's often helpful to get a bit more information for debuggin by actually looking at the exit status. I usually pull this from somewhere else in the code base that is doing this (which means it's probably time for a helper). I've pulled an example here for you. ;-) string message = "Shell command check "; if (WIFEXITED(status.get().get())) { message += " has exited with status "; message += stringify(WEXITSTATUS(status.get().get())); } else { message += " has terminated with signal "; message += strsignal(WTERMSIG(status.get().get())); } failure(message); src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment81111> Any reason not to just make this be a Promise<Nothing> rather than a pointer that we wrap in Owned? src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment81112> I'd really like to be consistent and either have all instance variables use the _ suffix or have none of them use it. Apologies if you've seen things that diverge from that in the codebase, but it's not a great pattern for future code readers so we shouldn't encourage it. src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment81113> Unused? src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment81114> I hate to be pedantic, but any reason not to call this --health_check or something to signify that this is the JSON version of HealthCheck? src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment81115> We name our flags the same as what you'd see on the command line, so s/taskID/task_id/ here. src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment81116> Any reason to put this on the heap? Stack allocating this should be sufficient and is simpler (saves you the delete later). src/health-check/main.cpp <https://reviews.apache.org/r/22579/#comment81117> On argument per line please. src/launcher/executor.cpp <https://reviews.apache.org/r/22579/#comment81118> Please move this down with the other "" includes. src/launcher/executor.cpp <https://reviews.apache.org/r/22579/#comment81126> Let's wrap this like we do in master.cpp and slave.cpp (search for 'install') as per the style guide preferences. src/launcher/executor.cpp <https://reviews.apache.org/r/22579/#comment81128> Can we wrap these parameters as well please? src/launcher/executor.cpp <https://reviews.apache.org/r/22579/#comment81127> { on newline please. Same below in 'launchHealthCheck' too. src/launcher/executor.cpp <https://reviews.apache.org/r/22579/#comment81129> Why checking task.has_command() instead of task.has_health_check()? src/launcher/executor.cpp <https://reviews.apache.org/r/22579/#comment81130> As per my comment earlier, it would be nice not to have to force the reader to try and figure out when to use a _ suffix and when not to. Rather than changing all the instance variable names here I'd prefer to just s/driver_/driver/ and s/healthCheckDir_/healthCheckDir_/ thanks! (Oh, and feel free to mangle the argument in 'registered' above with a prefix _ as we've done elsewhere in the code base to distinguish the two ... I'm not saying that's much better but at least we're being consistent!!). src/launcher/executor.cpp <https://reviews.apache.org/r/22579/#comment81135> Just an FYI, the mesos-executor should get installed in the same place as mesos-health-checker so you should just be able to use the dirname(argv[0]) to find it. Were you adding this so that people could override that location? src/messages/messages.proto <https://reviews.apache.org/r/22579/#comment81133> FYI, at some point we'll likely want to make this protobuf be part of our public API so other people can use mesos-health-checker for their executors. src/messages/messages.proto <https://reviews.apache.org/r/22579/#comment81131> How come this is an enum here but in TaskStatus it's just a bool called 'healthy'? I think a bool is really sufficient but if you're thinking something I'm not please comment and explain why you choose to do it this way. src/messages/messages.proto <https://reviews.apache.org/r/22579/#comment81132> Since this will never be used when 'healthy' is true why not call this consecutive_failures instead? Interestingly enough the variable name in HealthCheckProcess you use to set this field is called consecutiveFailures too! ;-) This also raises another question, do we want to s/failures/consecutive_failures/ in HealthCheck as well to be consistent? Now would be a good time to make that API change before we've really released this to the wild. src/slave/http.cpp <https://reviews.apache.org/r/22579/#comment81134> Why kill this? We like to keep two newlines between top-level functions like this. src/tests/health_check_tests.cpp <https://reviews.apache.org/r/22579/#comment81136> Sorry that this is confusing, but "" includes with the others at the bottom (for posterity, we follow the Google style guide). src/tests/health_check_tests.cpp <https://reviews.apache.org/r/22579/#comment81137> { on newline please. src/tests/health_check_tests.cpp <https://reviews.apache.org/r/22579/#comment81139> Why make this an option if it's default is 0? src/tests/health_check_tests.cpp <https://reviews.apache.org/r/22579/#comment81138> Is this comment a relic from a copy-paste? If anything I think what you want to say here is something along the lines of "using POSIX isolators so we can run these tests on both OS X and Linux". (And it looks like this applies to the rest of your tests too.) - Benjamin Hindman On June 17, 2014, 5:59 p.m., Timothy Chen wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/22579/ > ----------------------------------------------------------- > > (Updated June 17, 2014, 5:59 p.m.) > > > Review request for mesos, Benjamin Hindman and Niklas Nielsen. > > > Repository: mesos-git > > > Description > ------- > > Added task health check process that is launched with the command executor if > health check is configured. > It runs configured health check command and return the status to the executor > to report the task health. The executor also reports the task health status > back to the scheduler. > > The task health process keeps internal state based on health check > configuration, and determine when the task it is checking for should be > killed. Currently it's based on the number of consecutive failures it > observed. Once the condition meets it sends a task health status update with > a kill task flag turned on, and the executor will kill the task. > > > Diffs > ----- > > include/mesos/mesos.proto 709b8b1 > src/Makefile.am 3e623cc > src/common/type_utils.hpp 784a808 > src/health-check/main.cpp PRE-CREATION > src/launcher/executor.cpp 3d55d93 > src/messages/messages.proto 8aecc8b > src/slave/http.cpp cd7f692 > src/slave/slave.cpp bc976b7 > src/tests/health_check_tests.cpp PRE-CREATION > > Diff: https://reviews.apache.org/r/22579/diff/ > > > Testing > ------- > > Added unit tests and make check. > > > Thanks, > > Timothy Chen > >
