[ https://issues.apache.org/jira/browse/MESOS-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642202#comment-14642202 ]
haosdent commented on MESOS-3111: --------------------------------- {code} class ShutdownProcess : public Process<ShutdownProcess> { protected: virtual void initialize() { VLOG(1) << "Scheduling shutdown of the executor"; // TODO(benh): Pass the shutdown timeout with ExecutorRegistered // since it might have gotten configured on the command line. delay(slave::EXECUTOR_SHUTDOWN_GRACE_PERIOD, self(), &Self::kill); } {code} {code} void shutdown() { if (aborted) { VLOG(1) << "Ignoring shutdown message because the driver is aborted!"; return; } LOG(INFO) << "Executor asked to shutdown"; if (!local) { // Start the Shutdown Process. spawn(new ShutdownProcess(), true); } {code} And slave::EXECUTOR_SHUTDOWN_GRACE_PERIOD is default 5 seconds. > Executor Process gets killed before executor_shutdown_grace_period > ------------------------------------------------------------------ > > Key: MESOS-3111 > URL: https://issues.apache.org/jira/browse/MESOS-3111 > Project: Mesos > Issue Type: Bug > Components: framework > Affects Versions: 0.22.1 > Environment: OS X 10.10.3 > Reporter: Rajesh Kumar > > I am starting mesos slave with following command: > ./bin/mesos-slave.sh --master=zk://127.0.0.1:2181/global > --work_dir=/mnt1/logs/mesos/ --executor_registration_timeout=3mins > --executor_shutdown_grace_period=60secs > As you see I have specified executor_shutdown_grace_period as 60 sec. When > framework is shutdown it is expected that executors process will be shutdown > gracefully. If executor takes longer than executor_shutdown_grace_period, it > will be killed by SIGKILL. But it seems its not happening as expected. > Looking at following slave log, request to shutdown comes around 14:55:25 and > executor is killed with SIGKILL at 14:55:30, just 5 secs later. I was > expecting it to wait till 60 secs. > I0721 14:55:14.430140 178049024 slave.cpp:2164] Got registration for executor > 'global' of framework global_2015-07-21_09-23-57_542 from > executor(1)@172.16.20.211:62623 > I0721 14:55:14.432839 178049024 slave.cpp:1555] Sending queued task > 'Long_Running_Job 0' to executor 'global' of framework > global_2015-07-21_09-23-57_542 > I0721 14:55:23.275876 176975872 slave.cpp:3648] Current disk usage 36.83%. > Max allowed age: 3.721608945021898days > I0721 14:55:25.493098 180195328 slave.cpp:1768] Asked to shut down framework > global_2015-07-21_09-23-57_542 by master@127.0.0.1:5050 > I0721 14:55:25.493131 180195328 slave.cpp:1793] Shutting down framework > global_2015-07-21_09-23-57_542 > I0721 14:55:25.493176 180195328 slave.cpp:3473] Shutting down executor > 'global' of framework global_2015-07-21_09-23-57_542 > I0721 14:55:30.514751 179658752 containerizer.cpp:1123] Executor for > container 'de9f1d01-bbb6-4fbe-b41b-2a253498d6a1' has exited > I0721 14:55:30.514798 179658752 containerizer.cpp:918] Destroying container > 'de9f1d01-bbb6-4fbe-b41b-2a253498d6a1' > I0721 14:55:30.538887 177512448 slave.cpp:3223] Executor 'global' of > framework global_2015-07-21_09-23-57_542 terminated with signal Killed: 9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)