-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22579/#review45845
-----------------------------------------------------------


Awesome to see this Tim! I gave a really thorough review. Please let me know 
what doesn't make sense.


include/mesos/mesos.proto
<https://reviews.apache.org/r/22579/#comment80839>

    We try and wrap our comments at 70 characters. See 
http://mesos.apache.org/documentation/latest/mesos-c++-style-guide for more 
details. (I'm not sure if the style checker catches this.)



src/common/type_utils.hpp
<https://reviews.apache.org/r/22579/#comment80841>

    We include a space between 'if' and '('. I noticed this other places in the 
review too, please sweep through and clean them all up!



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment80842>

    This is an internal header which we use quotes for instead of <> (which are 
used for system/3rdparty headers), and we put them below all the 
system/3rdparty includes.



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment80843>

    Just like our headers, we separate these with a newline.



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment81079>

    Was there any reason you didn't take the 'executor' and 'taskId' arguments 
here too? It seems a bit arbitrary to take them below.



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment80844>

    We wrap these down below, ala:
    
    Future<Nothing> check
        const UPID& executor,
        const TaskID& taskID)
    {
    
    You'll still see some old code that doesn't do this, but use 
http://mesos.apache.org/documentation/latest/mesos-c++-style-guide for the 
preferred indentation style. Thanks!



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment80899>

    Any reason not to use LOG here? This is an internal tool and in the past 
it's been really valuable to use LOG because it includes timestamps (amongst 
other things)! This will also let you kill '[HealthCheck]'  (and don't worry 
about ending all your LOG lines with periods, see our style guide for more 
details). Moreover, it will let you use stdout for providing the output of the 
program, in case we ever wanted to output the health check results there. ;-)



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment80898>

    Just an FYI, we'll often wrap things like this using 'Seconds' since it 
will print it out in an easy to read manner. For example, rather than printing 
"259200 seconds" it will print "3 days" for you automagically. All you need to 
do is:
    
    << Seconds(check_.delay_seconds()) << ", grace period "
    << Seconds(check_.grace_period_seconds());
    
    I think you might do this other places too.



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment80845>

    You shouldn't need to cast to 'Duration'. I think you might do this a few 
other places too.



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment81080>

    Why pass a default argument? Seems like you always call this with an 
argument and it would be a bit unfortunate if we ever failed the promise with 
an empty string.



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment81081>

    When I'm actually trying to copy a value I prefer to use CopyFrom rather 
than MergeFrom. I've seen this throughout this review.



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment81082>

    I don't see any value in adding a 'message' parameter if we're not using 
it. This would be a simple addition in the future for someone else to do if 
they needed/wanted that functionality.



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment81083>

    Any reason not to just keep this on stderr? Honestly, it seems like we'd 
like to pipe the command's stdout/stderr to our stderr so that it's part of the 
stderr file that mesos-executor is writing to. Less moving parts, less things 
to discover, and later if people don't want this output they can tell the 
mesos-executor to put the data someplace else.



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment81089>

    See the style guide for how we do wrapping and indentation here. In this 
case the style guide prefers:
    
    Try<Subprocess> external =
      process::subprocess(command.value(), environment);



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment81086>

    Not a huge deal but just wanted to mention that we don't use periods at the 
end of these error messages because they often get chained. You've got a mix of 
some with and some without periods, best to just kill them all.



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment81084>

    Just FYI, we are defaulting to C++11 but we let people turn it off and '>>' 
won't compile in C++03 (needs to be '> >').



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment81125>

    We don't want to wait more than check_.timeout_seconds() here. I don't 
think it's a big deal if you don't make this asynchronous for now but we 
shouldn't wait more than that much time.



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment81090>

    We use Google style here, and put the + on the previous line. And just for 
future reference the same is true for conditionals, e.g.,:
    
    while (foo &&
           bar &&
           baz) ...
    
    I'm just going to comment on this one but there are a handful more so 
please do a quick run through in your code please.



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment81095>

    It's often helpful to get a bit more information for debuggin by actually 
looking at the exit status. I usually pull this from somewhere else in the code 
base that is doing this (which means it's probably time for a helper). I've 
pulled an example here for you. ;-)
    
    string message = "Shell command check ";
    
    if (WIFEXITED(status.get().get())) {
      message += " has exited with status ";
      message += stringify(WEXITSTATUS(status.get().get()));
    } else {
      message += " has terminated with signal ";
      message += strsignal(WTERMSIG(status.get().get()));
    }
    
    failure(message);



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment81111>

    Any reason not to just make this be a Promise<Nothing> rather than a 
pointer that we wrap in Owned?



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment81112>

    I'd really like to be consistent and either have all instance variables use 
the _ suffix or have none of them use it. Apologies if you've seen things that 
diverge from that in the codebase, but it's not a great pattern for future code 
readers so we shouldn't encourage it.



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment81113>

    Unused?



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment81114>

    I hate to be pedantic, but any reason not to call this --health_check or 
something to signify that this is the JSON version of HealthCheck?



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment81115>

    We name our flags the same as what you'd see on the command line, so 
s/taskID/task_id/ here.



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment81116>

    Any reason to put this on the heap? Stack allocating this should be 
sufficient and is simpler (saves you the delete later).



src/health-check/main.cpp
<https://reviews.apache.org/r/22579/#comment81117>

    On argument per line please.



src/launcher/executor.cpp
<https://reviews.apache.org/r/22579/#comment81118>

    Please move this down with the other "" includes.



src/launcher/executor.cpp
<https://reviews.apache.org/r/22579/#comment81126>

    Let's wrap this like we do in master.cpp and slave.cpp (search for 
'install') as per the style guide preferences.



src/launcher/executor.cpp
<https://reviews.apache.org/r/22579/#comment81128>

    Can we wrap these parameters as well please?



src/launcher/executor.cpp
<https://reviews.apache.org/r/22579/#comment81127>

    { on newline please. Same below in 'launchHealthCheck' too.



src/launcher/executor.cpp
<https://reviews.apache.org/r/22579/#comment81129>

    Why checking task.has_command() instead of task.has_health_check()?



src/launcher/executor.cpp
<https://reviews.apache.org/r/22579/#comment81130>

    As per my comment earlier, it would be nice not to have to force the reader 
to try and figure out when to use a _ suffix and when not to. Rather than 
changing all the instance variable names here I'd prefer to just 
s/driver_/driver/ and s/healthCheckDir_/healthCheckDir_/ thanks! (Oh, and feel 
free to mangle the argument in 'registered' above with a prefix _ as we've done 
elsewhere in the code base to distinguish the two ... I'm not saying that's 
much better but at least we're being consistent!!).



src/launcher/executor.cpp
<https://reviews.apache.org/r/22579/#comment81135>

    Just an FYI, the mesos-executor should get installed in the same place as 
mesos-health-checker so you should just be able to use the dirname(argv[0]) to 
find it. Were you adding this so that people could override that location?



src/messages/messages.proto
<https://reviews.apache.org/r/22579/#comment81133>

    FYI, at some point we'll likely want to make this protobuf be part of our 
public API so other people can use mesos-health-checker for their executors.



src/messages/messages.proto
<https://reviews.apache.org/r/22579/#comment81131>

    How come this is an enum here but in TaskStatus it's just a bool called 
'healthy'? I think a bool is really sufficient but if you're thinking something 
I'm not please comment and explain why you choose to do it this way.



src/messages/messages.proto
<https://reviews.apache.org/r/22579/#comment81132>

    Since this will never be used when 'healthy' is true why not call this 
consecutive_failures instead? Interestingly enough the variable name in 
HealthCheckProcess you use to set this field is called consecutiveFailures too! 
;-) This also raises another question, do we want to 
s/failures/consecutive_failures/ in HealthCheck as well to be consistent? Now 
would be a good time to make that API change before we've really released this 
to the wild.



src/slave/http.cpp
<https://reviews.apache.org/r/22579/#comment81134>

    Why kill this? We like to keep two newlines between top-level functions 
like this.



src/tests/health_check_tests.cpp
<https://reviews.apache.org/r/22579/#comment81136>

    Sorry that this is confusing, but "" includes with the others at the bottom 
(for posterity, we follow the Google style guide).



src/tests/health_check_tests.cpp
<https://reviews.apache.org/r/22579/#comment81137>

    { on newline please.



src/tests/health_check_tests.cpp
<https://reviews.apache.org/r/22579/#comment81139>

    Why make this an option if it's default is 0?



src/tests/health_check_tests.cpp
<https://reviews.apache.org/r/22579/#comment81138>

    Is this comment a relic from a copy-paste? If anything I think what you 
want to say here is something along the lines of "using POSIX isolators so we 
can run these tests on both OS X and Linux". (And it looks like this applies to 
the rest of your tests too.)


- Benjamin Hindman


On June 17, 2014, 5:59 p.m., Timothy Chen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22579/
> -----------------------------------------------------------
> 
> (Updated June 17, 2014, 5:59 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Niklas Nielsen.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added task health check process that is launched with the command executor if 
> health check is configured.
> It runs configured health check command and return the status to the executor 
> to report the task health. The executor also reports the task health status 
> back to the scheduler.
> 
> The task health process keeps internal state based on health check 
> configuration, and determine when the task it is checking for should be 
> killed. Currently it's based on the number of consecutive failures it 
> observed. Once the condition meets it sends a task health status update with 
> a kill task flag turned on, and the executor will kill the task.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 709b8b1 
>   src/Makefile.am 3e623cc 
>   src/common/type_utils.hpp 784a808 
>   src/health-check/main.cpp PRE-CREATION 
>   src/launcher/executor.cpp 3d55d93 
>   src/messages/messages.proto 8aecc8b 
>   src/slave/http.cpp cd7f692 
>   src/slave/slave.cpp bc976b7 
>   src/tests/health_check_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22579/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests and make check.
> 
> 
> Thanks,
> 
> Timothy Chen
> 
>

Reply via email to