[ https://issues.apache.org/jira/browse/MESOS-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jie Yu updated MESOS-759: ------------------------- Fix Version/s: (was: 0.17.0) 0.18.0 > The cgroups TaskKiller should skip freezing the cgroup if it is already empty. > ------------------------------------------------------------------------------ > > Key: MESOS-759 > URL: https://issues.apache.org/jira/browse/MESOS-759 > Project: Mesos > Issue Type: Bug > Affects Versions: 0.13.0, 0.14.0, 0.14.1, 0.14.2, 0.16.0, 0.15.0 > Reporter: Benjamin Mahler > Assignee: Ian Downes > Priority: Critical > Labels: twitter > Fix For: 0.18.0 > > > The current TasksKiller code always freezes the cgroup when trying to kill > the cgroup: > void killTasks() { > // Chain together the steps needed to kill the tasks. Note that we > // ignore the return values of freeze, kill, and thaw because, > // provided there are no errors, we'll just retry the chain as > // long as tasks still exist. > chain = kill(SIGSTOP) // Send stop signal to all > tasks. > .then(defer(self(), &Self::kill, SIGKILL)) // Now send kill signal. > .then(defer(self(), &Self::empty)) // Wait until cgroup is > empty. > .then(defer(self(), &Self::freeze)) // Freeze cgroug. > .then(defer(self(), &Self::kill, SIGKILL)) // Send kill signal to any > remaining tasks. > .then(defer(self(), &Self::thaw)) // Thaw cgroup to deliver > signals. > .then(defer(self(), &Self::empty)); // Wait until cgroup is > empty. > This should avoid freezing the cgroup, as we've seen instances where the > cgroup is unfreezable and thus this enters a loop attempting to freeze the > cgroup as upon failures we retry this procedure. -- This message was sent by Atlassian JIRA (v6.1.5#6160)