[ https://issues.apache.org/jira/browse/AURORA-698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596764#comment-14596764 ]
brian wickman commented on AURORA-698: -------------------------------------- {noformat} commit 73ceeb22a18e4b3df3bffb04cf7d58527066fb5a Author: Brian Wickman <wick...@apache.org> Date: Mon Jun 1 15:20:25 2015 -0700 Daemonize all deadline calls in aurora executor. If we do not daemonize, it's possible for the aurora executor to send TASK_KILLED and then block indefinitely on shutdown. This way the aurora executor process will at least exit, allow the cgroup to tear down all active processes. Testing Done: ./pants test src/test/python/apache/aurora/executor:: Bugs closed: AURORA-698 Reviewed at https://reviews.apache.org/r/34484/ {noformat} > aurora executor _shutdown deadline calls should be daemonized > ------------------------------------------------------------- > > Key: AURORA-698 > URL: https://issues.apache.org/jira/browse/AURORA-698 > Project: Aurora > Issue Type: Bug > Components: Executor > Reporter: brian wickman > Assignee: brian wickman > > In the aurora executor shutdown method, we have deadline() calls: > {noformat} > def _shutdown(self, status_result): > runner_status = self._runner.status > try: > deadline(self._runner.stop, timeout=self.STOP_TIMEOUT) > except Timeout: > log.error('Failed to stop runner within deadline.') > try: > deadline(self._chained_checker.stop, timeout=self.STOP_TIMEOUT) > except Timeout: > log.error('Failed to stop all checkers within deadline.') > # If the runner was alive when _shutdown was called, defer to the > status_result, > # otherwise the runner's terminal state is the preferred state. > exit_status = runner_status or status_result > self.send_update( > self._driver, > self._task_id, > exit_status.status, > status_result.reason) > self.terminated.set() > defer(self._driver.stop, delay=self.PERSISTENCE_WAIT) > {noformat} > However if runner.stop fails with a Timeout exception, the spawned > AnonymousThread is not daemonized and causes the executor to fail to exit. > This means that the cgroup will not be torn down and if the runner.stop > actually failed, the process can stay alive even if TASK_KILLED was delivered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)