----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/44660/ -----------------------------------------------------------
(Updated April 1, 2016, 3:38 p.m.) Review request for mesos, Ben Mahler, Gilbert Song, and Timothy Chen. Changes ------- Updated testing section. Bugs: MESOS-4909 https://issues.apache.org/jira/browse/MESOS-4909 Repository: mesos Description ------- The docker executor determines how much time it allots the underlying container to clean up (via passing the timeout to the docker daemon) based on both optional task's `KillPolicy` and optional `shutdown_grace_period` field in `ExecutorInfo`. Diffs ----- include/mesos/mesos.proto 59f5d3aadd1cb32c1eb869451ad55d877bdeee28 include/mesos/v1/mesos.proto 6556a1c243d227fdb9af0d3c1063fed08d3f6c8b src/docker/executor.cpp afc769d0887e3842106e4c350e94c95c8ffc085e Diff: https://reviews.apache.org/r/44660/diff/ Testing (updated) ------- On CentOS 7.2.1511: `make check` To test the newly introduced protobuf fields, I've modified "mesos-execute" to support these fields. I've inserted these lines in `offers()` method: ``` task.mutable_kill_policy()->mutable_grace_period()-> set_nanoseconds(Seconds(<value>).ns()); ``` I've also implemented a "--no-shell" flag to support setting `CommandInfo.shell` to `false`. An example command used in testing: ``` ./src/mesos-execute --master=192.99.40.208:5551 --name=test --no-shell --env='{"GLOG_v": 1}' --containerizer=docker --docker_image=rukletsov/unresponsive-process ``` Mesos master is launched via: ``` ./bin/mesos-master.sh --work_dir=m/work --port=5551 ``` Mesos agent is launched via: ``` sudo su GLOG_v=1 ./bin/mesos-slave.sh --work_dir=/tmp --master=192.99.40.208:5551 --port=5552 --containerizers=docker ``` In order to test the docker stop timeout, which is what the docker executor uses currently for kill policy, I've created a tiny application that ignores SIGTERM (https://github.com/rukletsov/unresponsive-process ), wrapped it in a docker container and published on docker hub (https://hub.docker.com/r/rukletsov/unresponsive-process/ ). Test 1.1 ======== `KillPolicy.grace_period` is **not set**, `shell=false`, framework shutdown. Framework is asked to shutdown several seconds after the task is started. This results in agent asking executor to shutdown. From the agent's logs, we see that the command executor sends a TASK_KILLED update in approx. **3 seconds** (the default value is **5 seconds**, but we deduct some buffer to make sure the task can be reaped and TASK_KILLED is sent) after it has been asked to shut down: ``` I0401 10:40:08.641249 13266 slave.cpp:2215] Asked to shut down framework 1615ae6e-e907-4296-8d46-d864fae3359b-0000 by master@192.99.40.208:5551 I0401 10:40:08.641286 13266 slave.cpp:2240] Shutting down framework 1615ae6e-e907-4296-8d46-d864fae3359b-0000 I0401 10:40:08.641402 13266 slave.cpp:4398] Shutting down executor 'test' of framework 1615ae6e-e907-4296-8d46-d864fae3359b-0000 at executor(1)@192.99.40.208:51023 I0401 10:40:11.837626 13271 slave.cpp:3184] Handling status update TASK_KILLED (UUID: e3388b10-6bd0-4571-ad08-30b77550392e) for task test of framework 1615ae6e-e907-4296-8d46-d864fae3359b-0000 from exec utor(1)@192.99.40.208:51023 ``` In an excerpt from the executor's log we see, that docker timeout is set to 3s: ``` I0401 10:40:08.641784 13360 exec.cpp:399] Executor asked to shutdown I0401 10:40:08.641891 13368 exec.cpp:91] Scheduling shutdown of the executor in 5secs I0401 10:40:08.641902 13360 exec.cpp:414] Executor::shutdown took 19573ns I0401 10:40:08.642026 13365 docker.cpp:134] 'docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e GLOG_v=2 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-ef6f8d28-d279-40f7-b66a-f9659ca72299-S1.379cbab7-1ba4-4901-b8fa-b717bd1493f4 -v /tmp/slaves/ef6f8d28-d279-40f7-b66a-f9659ca72299-S1/frameworks/1615ae6e-e907-4296-8d46-d864fae3359b-0000/executors/test/runs/379cbab7-1ba4-4901-b8fa-b717bd1493f4:/mnt/mesos/sandbox --net host --name mesos-ef6f8d28-d279-40f7-b66a-f9659ca72299-S1.379cbab7-1ba4-4901-b8fa-b717bd1493f4 rukletsov/unresponsive-process' is being discarded I0401 10:40:08.667981 13365 docker.cpp:712] Running docker -H unix:///var/run/docker.sock stop -t 3 mesos-ef6f8d28-d279-40f7-b66a-f9659ca72299-S1.379cbab7-1ba4-4901-b8fa-b717bd1493f4 I0401 10:40:11.836829 13363 exec.cpp:535] Executor sending status update TASK_KILLED (UUID: e3388b10-6bd0-4571-ad08-30b77550392e) for task test of framework 1615ae6e-e907-4296-8d46-d864fae3359b-0000 ``` Test 1.2 ======== `KillPolicy.grace_period` is **not set**, `shell=true`, framework shutdown. The only difference to test 1.1 is `shell=true` and as required by Mesos in this case, `command` is set to container's entrypoint. Similar behavior is observed: ``` I0401 11:00:41.629098 14162 exec.cpp:312] Executor asked to run task 'test' I0401 11:00:41.629187 14162 exec.cpp:321] Executor::launchTask took 68618ns I0401 11:00:41.629458 14169 docker.cpp:650] Running docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e GLOG_v=2 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.6a5b48a4-37d8-4169-95ac-54858e3df4eb -v /tmp/slaves/1615ae6e-e907-4296-8d46-d864fae3359b-S0/frameworks/1615ae6e-e907-4296-8d46-d864fae3359b-0002/executors/test/runs/6a5b48a4-37d8-4169-95ac-54858e3df4eb:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.6a5b48a4-37d8-4169-95ac-54858e3df4eb rukletsov/unresponsive-process -c /unresponsive-process/bin/unresponsive_process ... I0401 11:00:45.130071 14172 exec.cpp:399] Executor asked to shutdown I0401 11:00:45.130215 14167 exec.cpp:91] Scheduling shutdown of the executor in 5secs I0401 11:00:45.130277 14172 exec.cpp:414] Executor::shutdown took 54824ns I0401 11:00:45.130389 14164 docker.cpp:134] 'docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e GLOG_v=2 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.6a5b48a4-37d8-4169-95ac-54858e3df4eb -v /tmp/slaves/1615ae6e-e907-4296-8d46-d864fae3359b-S0/frameworks/1615ae6e-e907-4296-8d46-d864fae3359b-0002/executors/test/runs/6a5b48a4-37d8-4169-95ac-54858e3df4eb:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.6a5b48a4-37d8-4169-95ac-54858e3df4eb rukletsov/unresponsive-process -c /unresponsive-process/bin/unresponsive_process' is being discarded I0401 11:00:45.154701 14164 docker.cpp:712] Running docker -H unix:///var/run/docker.sock stop -t 3 mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.6a5b48a4-37d8-4169-95ac-54858e3df4eb I0401 11:00:48.366492 14167 exec.cpp:535] Executor sending status update TASK_KILLED (UUID: d289815e-2ace-4119-8aa2-3f852985e70d) for task test of framework 1615ae6e-e907-4296-8d46-d864fae3359b-0002 ``` Test 1.3: ========= `KillPolicy.grace_period` **not set**, `shell=false`, `killTask` Similar to test 1.1, but instead of framework shutdown, `killTask()` is called. The difference is that the `killTask` path is being executed and shows the timeout is approx. **5s**, which comes from the default value for the executor shutdown grace period: ``` I0401 10:20:24.339117 11940 slave.cpp:2027] Asked to kill task test of framework 0f5162d4-d6d5-4600-be83-ec09ab9d7312-0001 ... I0401 10:20:29.550489 11941 slave.cpp:3184] Handling status update TASK_KILLED (UUID: 3cb4d9a0-d907-4c75-bd16-8edbecdf1f75) for task test of framework 0f5162d4-d6d5-4600-be83-ec09ab9d7312-0001 from exec utor(1)@192.99.40.208:39127 ``` In an excerpt from the executor's log we see, that docker timeout is set to 5s: ``` I0401 10:20:24.339557 12382 exec.cpp:332] Executor asked to kill task 'test' I0401 10:20:24.339617 12382 exec.cpp:341] Executor::killTask took 29416ns I0401 10:20:24.339764 12380 docker.cpp:134] 'docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e GLOG_v=2 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-ef6f8d28-d279-40f7-b66a-f9659ca72299-S1.9a03b3f9-ee93-412f-9954-bd4fd7120978 -v /tmp/slaves/ef6f8d28-d279-40f7-b66a-f9659ca72299-S1/frameworks/0f5162d4-d6d5-4600-be83-ec09ab9d7312-0001/executors/test/runs/9a03b3f9-ee93-412f-9954-bd4fd7120978:/mnt/mesos/sandbox --net host --name mesos-ef6f8d28-d279-40f7-b66a-f9659ca72299-S1.9a03b3f9-ee93-412f-9954-bd4fd7120978 rukletsov/unresponsive-process' is being discarded I0401 10:20:24.369420 12380 docker.cpp:712] Running docker -H unix:///var/run/docker.sock stop -t 5 mesos-ef6f8d28-d279-40f7-b66a-f9659ca72299-S1.9a03b3f9-ee93-412f-9954-bd4fd7120978 I0401 10:20:29.549844 12377 exec.cpp:535] Executor sending status update TASK_KILLED (UUID: 3cb4d9a0-d907-4c75-bd16-8edbecdf1f75) for task test of framework 0f5162d4-d6d5-4600-be83-ec09ab9d7312-0001 I0401 ``` Test 2.1: ========= `KillPolicy.grace_period` set to **0s**, `shell=false`, framework shutdown. Similar to test 1.1, but the grace period is 0s (no deductions because the kill policy is set explicitly; the grace period in containerizer should be adjusted to approx. 2 seconds): ``` I0401 11:24:42.189540 16425 slave.cpp:2215] Asked to shut down framework 7cf896e3-01f8-4aca-9a2a-169ac281fe00-0000 by master@192.99.40.208:5551 I0401 11:24:42.189584 16425 slave.cpp:2240] Shutting down framework 7cf896e3-01f8-4aca-9a2a-169ac281fe00-0000 I0401 11:24:42.189716 16425 slave.cpp:4398] Shutting down executor 'test' of framework 7cf896e3-01f8-4aca-9a2a-169ac281fe00-0000 at executor(1)@192.99.40.208:41179 I0401 11:24:42.413045 16427 slave.cpp:3184] Handling status update TASK_KILLED (UUID: f7339631-72a6-465a-ae9b-5c9b9789bf39) for task test of framework 7cf896e3-01f8-4aca-9a2a-169ac281fe00-0000 from executor(1)@192.99.40.208:41179 ``` In an excerpt from the executor's log we see, that docker timeout is set to 0s, while the grace period in the executor driver is set to 2s: ``` I0401 11:24:42.190176 16535 exec.cpp:399] Executor asked to shutdown I0401 11:24:42.190352 16534 exec.cpp:91] Scheduling shutdown of the executor in 2secs I0401 11:24:42.190382 16535 exec.cpp:414] Executor::shutdown took 76197ns I0401 11:24:42.190491 16541 docker.cpp:134] 'docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e GLOG_v=2 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.e98d6f1a-494e-41ea-b2e5-c6d00e593bed -v /tmp/slaves/1615ae6e-e907-4296-8d46-d864fae3359b-S0/frameworks/7cf896e3-01f8-4aca-9a2a-169ac281fe00-0000/executors/test/runs/e98d6f1a-494e-41ea-b2e5-c6d00e593bed:/mnt/mesos/sandbox --net host --name mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.e98d6f1a-494e-41ea-b2e5-c6d00e593bed rukletsov/unresponsive-process' is being discarded I0401 11:24:42.214829 16541 docker.cpp:712] Running docker -H unix:///var/run/docker.sock stop -t 0 mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.e98d6f1a-494e-41ea-b2e5-c6d00e593bed I0401 11:24:42.412222 16533 exec.cpp:535] Executor sending status update TASK_KILLED (UUID: f7339631-72a6-465a-ae9b-5c9b9789bf39) for task test of framework 7cf896e3-01f8-4aca-9a2a-169ac281fe00-0000 ``` Test 3.1: ========= `KillPolicy.grace_period` set to **11s**, `shell=false`, framework shutdown Similar to test 1.1, but the grace period is 11s (no deductions because the kill policy is set explicitly; the grace period in containerizer should be adjusted to approx. 13 seconds): ``` I0401 11:16:55.461164 15451 slave.cpp:2215] Asked to shut down framework d87ba42a-0e78-4e97-bfe9-264743cf845e-0000 by master@192.99.40.208:5551 I0401 11:16:55.461213 15451 slave.cpp:2240] Shutting down framework d87ba42a-0e78-4e97-bfe9-264743cf845e-0000 I0401 11:16:55.461362 15451 slave.cpp:4398] Shutting down executor 'test' of framework d87ba42a-0e78-4e97-bfe9-264743cf845e-0000 at executor(1)@192.99.40.208:49340 I0401 11:17:02.416498 15452 slave.cpp:4945] Querying resource estimator for oversubscribable resources I0401 11:17:02.416743 15452 slave.cpp:4959] Received oversubscribable resources from the resource estimator I0401 11:17:03.270429 15460 slave.cpp:3664] Received ping from slave-observer(1)@192.99.40.208:5551 I0401 11:17:03.694782 15465 process.cpp:3136] Handling HTTP event for process 'files' with path: '/files/browse' I0401 11:17:06.646240 15462 slave.cpp:3184] Handling status update TASK_KILLED (UUID: 8966c1fb-6fcb-4e8f-a298-c94003154a83) for task test of framework d87ba42a-0e78-4e97-bfe9-264743cf845e-0000 from executor(1)@192.99.40.208:49340 ``` In an excerpt from the executor's log we see, that docker timeout is set to 11s, while the grace period in the executor driver is set to 13s: ``` I0401 11:16:55.461809 15582 exec.cpp:399] Executor asked to shutdown I0401 11:16:55.461931 15577 exec.cpp:91] Scheduling shutdown of the executor in 13secs I0401 11:16:55.461946 15582 exec.cpp:414] Executor::shutdown took 27811ns I0401 11:16:55.462087 15573 docker.cpp:134] 'docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e GLOG_v=2 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.9cd3672f-7000-4131-ae24-b89b032dcf16 -v /tmp/slaves/1615ae6e-e907-4296-8d46-d864fae3359b-S0/frameworks/d87ba42a-0e78-4e97-bfe9-264743cf845e-0000/executors/test/runs/9cd3672f-7000-4131-ae24-b89b032dcf16:/mnt/mesos/sandbox --net host --name mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.9cd3672f-7000-4131-ae24-b89b032dcf16 rukletsov/unresponsive-process' is being discarded I0401 11:16:55.492219 15573 docker.cpp:712] Running docker -H unix:///var/run/docker.sock stop -t 11 mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.9cd3672f-7000-4131-ae24-b89b032dcf16 I0401 11:17:06.645469 15587 exec.cpp:535] Executor sending status update TASK_KILLED (UUID: 8966c1fb-6fcb-4e8f-a298-c94003154a83) for task test of framework d87ba42a-0e78-4e97-bfe9-264743cf845e-0000 ``` Thanks, Alexander Rukletsov