-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61530/#review182573
-----------------------------------------------------------




src/docker/executor.cpp
Lines 31 (patched)
<https://reviews.apache.org/r/61530/#comment258494>

    We sort includes alphabetically.



src/docker/executor.cpp
Lines 410-415 (original), 416-421 (patched)
<https://reviews.apache.org/r/61530/#comment258498>

    Let's add a comment explaining why are we doing retry / unblock on failure 
and why not timeout. Something like:
    ```
    // Invoking `docker stop` might be unsuccessful, in which case the 
container most probably does not receive the signal. In this case we should 
allow schedulers to retry the kill operation or, if the kill was initiated by a 
failing health check, retry ourselves. We do not bail out nor stop retrying to 
avoid sending a terminal status update while the container might still be 
running.
    //
    // NOTE: `docker stop` might also hang. We do not address this for now, 
because there is no evidence that in this case docker daemon might funciton 
properly, i.e., it is only the docker cli command that hangs, and hence there 
is not so much we can do.
    ```



src/docker/executor.cpp
Lines 427-428 (patched)
<https://reviews.apache.org/r/61530/#comment258499>

    Please no "magic constants" in the code. Add it to the existing section 
above. `5s` sounds reasonable to me.



src/docker/executor.cpp
Lines 604-606 (original), 616-618 (patched)
<https://reviews.apache.org/r/61530/#comment258497>

    We should augment the comment here explaining why we need 
`killAttemptInProgress` or whatever we end up having.



src/docker/executor.cpp
Lines 606-608 (original), 618-621 (patched)
<https://reviews.apache.org/r/61530/#comment258496>

    Instead of `killed` flag, which can be rolled back, and `firstKillAttempt`, 
that kinda indicates that a kill has been issued before, how about having 
`killed` and `killAttemptInProgress` with the following semantics:
    - After task transitions to `killed`, there is no way back. This 
corresponds to our intent. This also fixes the race with the health update that 
might come in after we set `killed = false` to unblock schedulers.
    - `killAttemptInProgress` will guard kill requests from schedulers and will 
be adjusted accordingly.


- Alexander Rukletsov


On Aug. 9, 2017, 4:55 p.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61530/
> -----------------------------------------------------------
> 
> (Updated Aug. 9, 2017, 4:55 p.m.)
> 
> 
> Review request for mesos and Alexander Rukletsov.
> 
> 
> Bugs: MESOS-6743
>     https://issues.apache.org/jira/browse/MESOS-6743
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, after `docker stop` command failure, docker executor
> neither allowed a scheduler to retry `killTask` command, nor retried to
> kill a task when task killing was triggered by a failed health check.
> 
> 
> Diffs
> -----
> 
>   src/docker/executor.cpp 26f12ec002f754fab0d34c01472cf95b499d8007 
> 
> 
> Diff: https://reviews.apache.org/r/61530/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> Manual testing:
> 
> Emulating `docker stop` errors:
> ===============================
> 1. Add `return fmt.Errorf("Emulating error!")` to 
> https://github.com/docker/docker-ce/blob/master/components/engine/daemon/stop.go#L21
> 2. build docker from sources: 
> http://oyvindsk.com/writing/docker-build-from-source
> 3. stop docker service and launch dockerd binary, like: `sudo 
> ./bundles/17.06.0-dev/binary-daemon/dockerd`
> 
> Emulating docker daemon hang:
> =============================
> 1. `ps aux|grep dockerd` - 2 processes will be found
> 2. `sudo kill -STOP <PID1> <PID2>` - send SIGSTOP to docker daemon processes 
> just before sending `docker stop`
> 
> Emulating health check failure in docker executor:
> ==================================================
> 1. Add
> ```c++
>   static int fake = 0;
>   if (++fake > 10) {
>     failure();
>     return;
>   }
> ```
> to `HealthChecker::processCheckResult()` in `src/checks/health_checker.cpp`
> 2. Add
> ```c++
>        HealthCheck healthCheck;
>        healthCheck.set_type(HealthCheck::COMMAND);
>        healthCheck.mutable_command()->set_value("exit 0");
>        healthCheck.set_delay_seconds(0);
>        healthCheck.set_interval_seconds(0);
>        healthCheck.set_grace_period_seconds(1);
>        _task.mutable_health_check()->CopyFrom(healthCheck);
> ```
> to `CommandScheduler::offers()` in `src/cli/execute.cpp`
> 3. compile mesos
> 4. run mesos agent: `sudo GLOG_v=1 ./bin/mesos-agent.sh 
> --resources="cpus:10000;mem:1000000" 
> --work_dir='/home/some_user/mesos/build/var/agent-1' 
> --containerizers="docker,mesos" --master="127.0.1.1:5050"`
> 5. launch docker executor: `./src/mesos-execute --master="127.0.1.1:5050" 
> --name="a" --containerizer=docker --docker_image="ubuntu:xenial" 
> --command="while true; do : ; done"`
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>

Reply via email to