----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/46325/#review130116 -----------------------------------------------------------
Patch looks great! Reviews applied: [46321, 46491, 46322, 46323, 46325] Passed command: export OS='ubuntu:14.04' CONFIGURATION='--verbose' COMPILER='gcc' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker_build.sh - Mesos ReviewBot On April 22, 2016, 2:46 p.m., Alexander Rukletsov wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/46325/ > ----------------------------------------------------------- > > (Updated April 22, 2016, 2:46 p.m.) > > > Review request for mesos, Ben Mahler and Qian Zhang. > > > Bugs: MESOS-4908 > https://issues.apache.org/jira/browse/MESOS-4908 > > > Repository: mesos > > > Description > ------- > > Kill policy can be provided in a kill event. In this case it should > take precedence over kill policy specified when the task was launched. > When kill event is issued multiple times during the task termination, > the signal escalation timeout (the time a task has between SIGTERM > and SIGKILL) may be reduced. > > Since updating the delay timer (we use it for signal escalation delay) > is currently not possible, we cancel the existing signal timer and set > up a new one. `Clock::cancel()` guarantees that, if existed, the timer > is removed before the function returns; hence we do not set up more > than 1 timer for signal escalation delay. > > > Diffs > ----- > > src/launcher/http_command_executor.cpp > 2ce296740abac096aa85ad11dd4d191e42c1aa06 > > Diff: https://reviews.apache.org/r/46325/diff/ > > > Testing > ------- > > `make check` > > Tested manually using modified `mesos-execute` in a way that **two** extra > kill task requests are sent, 2s and 3s after receiving `TASK_RUNNING`. Each > kill task request specifies `KillPolicy` with 1s grace period. Together with > a kill *without* a kill policy scheduled 1s after the task is being launched, > the task receives **three** kill requests in total. > > Master: `./bin/mesos-master.sh --work_dir=/tmp/w/m --ip=127.0.0.1` > Agent: `./bin/mesos-slave.sh --work_dir=/tmp/w/s --master=127.0.0.1:5050 > --http_command_executor` > Mesos-execute: `./src/mesos-execute --master=127.0.0.1:5050 --name=test > --command="/Users/alex/bin/unresponsive_process" --env='{"GLOG_v": "2"}' > --kill_after=1secs` > > HTTP command executor log: > ``` > Received SUBSCRIBED event > Subscribed executor on alexr.fritz.box > Received LAUNCH event > Starting task test > sh -c '/Users/alex/bin/unresponsive_process' > Forked command at 17475 > 14455919081943275466 > Received ACKNOWLEDGED event > 17172602460659762152 > Received KILL event > Received kill for task test > Sending SIGTERM to process tree at pid 17475 > Sent SIGTERM to the following process trees: > [ > --- 17475 /Users/alex/bin/unresponsive_process > ] > 4381544758593790168 > Scheduling escalation to SIGKILL in 3secs from now > Received ACKNOWLEDGED event > Received KILL event > Received kill for task test > Rescheduling escalation to SIGKILL in 1secs from now > 10370891801885978953 > Process 17475 did not terminate after 1secs, sending SIGKILL to process tree > at 17475 > Killed the following process trees: > [ > --- 17475 /Users/alex/bin/unresponsive_process > ] > Received KILL event > Received kill for task test > Command terminated with signal Killed: 9 (pid: 17475) > ``` > > Excerpt from the agent log that shows all 3 kill task requests and that the > segnal escalation timeout was reduced from 3s to 1s: > ``` > I0418 14:27:17.825070 244285440 slave.cpp:3605] Forwarding the update > TASK_RUNNING (UUID: 925e2d89-f6eb-464d-9a50-a74a8e07bc88) for task test of > framework ab374773-a018-4531-923b-899cf1e4f573-0000 to master@127.0.0.1:5050 > I0418 14:27:17.831233 242139136 status_update_manager.cpp:392] Received > status update acknowledgement (UUID: 925e2d89-f6eb-464d-9a50-a74a8e07bc88) > for task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000 > I0418 14:27:18.834309 244285440 slave.cpp:2046] Asked to kill task test of > framework ab374773-a018-4531-923b-899cf1e4f573-0000 > I0418 14:27:18.842150 244285440 http.cpp:178] HTTP POST for > /slave(1)/api/v1/executor from 192.168.178.24:54206 > I0418 14:27:18.842331 244285440 slave.cpp:3207] Handling status update > TASK_KILLING (UUID: a2f6eca7-b3e5-4e45-adcb-356f75355563) for task test of > framework ab374773-a018-4531-923b-899cf1e4f573-0000 > I0418 14:27:18.843214 242139136 status_update_manager.cpp:320] Received > status update TASK_KILLING (UUID: a2f6eca7-b3e5-4e45-adcb-356f75355563) for > task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000 > I0418 14:27:18.843387 243748864 slave.cpp:3605] Forwarding the update > TASK_KILLING (UUID: a2f6eca7-b3e5-4e45-adcb-356f75355563) for task test of > framework ab374773-a018-4531-923b-899cf1e4f573-0000 to master@127.0.0.1:5050 > I0418 14:27:18.846459 242675712 status_update_manager.cpp:392] Received > status update acknowledgement (UUID: a2f6eca7-b3e5-4e45-adcb-356f75355563) > for task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000 > I0418 14:27:19.836699 240529408 slave.cpp:2046] Asked to kill task test of > framework ab374773-a018-4531-923b-899cf1e4f573-0000 > I0418 14:27:20.850658 240529408 slave.cpp:2046] Asked to kill task test of > framework ab374773-a018-4531-923b-899cf1e4f573-0000 > I0418 14:27:20.927338 241602560 http.cpp:178] HTTP POST for > /slave(1)/api/v1/executor from 192.168.178.24:54206 > I0418 14:27:20.927465 241602560 slave.cpp:3207] Handling status update > TASK_KILLED (UUID: fdcb43ec-271b-4976-aaf3-59a031cdc924) for task test of > framework ab374773-a018-4531-923b-899cf1e4f573-0000 > I0418 14:27:20.928771 240529408 status_update_manager.cpp:320] Received > status update TASK_KILLED (UUID: fdcb43ec-271b-4976-aaf3-59a031cdc924) for > task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000 > I0418 14:27:20.928933 243748864 slave.cpp:3605] Forwarding the update > TASK_KILLED (UUID: fdcb43ec-271b-4976-aaf3-59a031cdc924) for task test of > framework ab374773-a018-4531-923b-899cf1e4f573-0000 to master@127.0.0.1:5050 > > ``` > > > Thanks, > > Alexander Rukletsov > >