----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/61530/#review182573 -----------------------------------------------------------
src/docker/executor.cpp Lines 31 (patched) <https://reviews.apache.org/r/61530/#comment258494> We sort includes alphabetically. src/docker/executor.cpp Lines 410-415 (original), 416-421 (patched) <https://reviews.apache.org/r/61530/#comment258498> Let's add a comment explaining why are we doing retry / unblock on failure and why not timeout. Something like: ``` // Invoking `docker stop` might be unsuccessful, in which case the container most probably does not receive the signal. In this case we should allow schedulers to retry the kill operation or, if the kill was initiated by a failing health check, retry ourselves. We do not bail out nor stop retrying to avoid sending a terminal status update while the container might still be running. // // NOTE: `docker stop` might also hang. We do not address this for now, because there is no evidence that in this case docker daemon might funciton properly, i.e., it is only the docker cli command that hangs, and hence there is not so much we can do. ``` src/docker/executor.cpp Lines 427-428 (patched) <https://reviews.apache.org/r/61530/#comment258499> Please no "magic constants" in the code. Add it to the existing section above. `5s` sounds reasonable to me. src/docker/executor.cpp Lines 604-606 (original), 616-618 (patched) <https://reviews.apache.org/r/61530/#comment258497> We should augment the comment here explaining why we need `killAttemptInProgress` or whatever we end up having. src/docker/executor.cpp Lines 606-608 (original), 618-621 (patched) <https://reviews.apache.org/r/61530/#comment258496> Instead of `killed` flag, which can be rolled back, and `firstKillAttempt`, that kinda indicates that a kill has been issued before, how about having `killed` and `killAttemptInProgress` with the following semantics: - After task transitions to `killed`, there is no way back. This corresponds to our intent. This also fixes the race with the health update that might come in after we set `killed = false` to unblock schedulers. - `killAttemptInProgress` will guard kill requests from schedulers and will be adjusted accordingly. - Alexander Rukletsov On Aug. 9, 2017, 4:55 p.m., Andrei Budnik wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/61530/ > ----------------------------------------------------------- > > (Updated Aug. 9, 2017, 4:55 p.m.) > > > Review request for mesos and Alexander Rukletsov. > > > Bugs: MESOS-6743 > https://issues.apache.org/jira/browse/MESOS-6743 > > > Repository: mesos > > > Description > ------- > > Previously, after `docker stop` command failure, docker executor > neither allowed a scheduler to retry `killTask` command, nor retried to > kill a task when task killing was triggered by a failed health check. > > > Diffs > ----- > > src/docker/executor.cpp 26f12ec002f754fab0d34c01472cf95b499d8007 > > > Diff: https://reviews.apache.org/r/61530/diff/1/ > > > Testing > ------- > > sudo make check > > Manual testing: > > Emulating `docker stop` errors: > =============================== > 1. Add `return fmt.Errorf("Emulating error!")` to > https://github.com/docker/docker-ce/blob/master/components/engine/daemon/stop.go#L21 > 2. build docker from sources: > http://oyvindsk.com/writing/docker-build-from-source > 3. stop docker service and launch dockerd binary, like: `sudo > ./bundles/17.06.0-dev/binary-daemon/dockerd` > > Emulating docker daemon hang: > ============================= > 1. `ps aux|grep dockerd` - 2 processes will be found > 2. `sudo kill -STOP <PID1> <PID2>` - send SIGSTOP to docker daemon processes > just before sending `docker stop` > > Emulating health check failure in docker executor: > ================================================== > 1. Add > ```c++ > static int fake = 0; > if (++fake > 10) { > failure(); > return; > } > ``` > to `HealthChecker::processCheckResult()` in `src/checks/health_checker.cpp` > 2. Add > ```c++ > HealthCheck healthCheck; > healthCheck.set_type(HealthCheck::COMMAND); > healthCheck.mutable_command()->set_value("exit 0"); > healthCheck.set_delay_seconds(0); > healthCheck.set_interval_seconds(0); > healthCheck.set_grace_period_seconds(1); > _task.mutable_health_check()->CopyFrom(healthCheck); > ``` > to `CommandScheduler::offers()` in `src/cli/execute.cpp` > 3. compile mesos > 4. run mesos agent: `sudo GLOG_v=1 ./bin/mesos-agent.sh > --resources="cpus:10000;mem:1000000" > --work_dir='/home/some_user/mesos/build/var/agent-1' > --containerizers="docker,mesos" --master="127.0.1.1:5050"` > 5. launch docker executor: `./src/mesos-execute --master="127.0.1.1:5050" > --name="a" --containerizer=docker --docker_image="ubuntu:xenial" > --command="while true; do : ; done"` > > > Thanks, > > Andrei Budnik > >