[ 
https://issues.apache.org/jira/browse/AURORA-698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596764#comment-14596764
 ] 

brian wickman commented on AURORA-698:
--------------------------------------

{noformat}
commit 73ceeb22a18e4b3df3bffb04cf7d58527066fb5a
Author: Brian Wickman <wick...@apache.org>
Date:   Mon Jun 1 15:20:25 2015 -0700

    Daemonize all deadline calls in aurora executor.
    
    If we do not daemonize, it's possible for the aurora executor to send
    TASK_KILLED and then block indefinitely on shutdown.  This way the aurora
    executor process will at least exit, allow the cgroup to tear down all
    active processes.
    
    Testing Done:
    ./pants test src/test/python/apache/aurora/executor::
    
    Bugs closed: AURORA-698
    
    Reviewed at https://reviews.apache.org/r/34484/

{noformat}

> aurora executor _shutdown deadline calls should be daemonized
> -------------------------------------------------------------
>
>                 Key: AURORA-698
>                 URL: https://issues.apache.org/jira/browse/AURORA-698
>             Project: Aurora
>          Issue Type: Bug
>          Components: Executor
>            Reporter: brian wickman
>            Assignee: brian wickman
>
> In the aurora executor shutdown method, we have deadline() calls:
> {noformat}
>   def _shutdown(self, status_result):
>     runner_status = self._runner.status
>     try:
>       deadline(self._runner.stop, timeout=self.STOP_TIMEOUT)
>     except Timeout:
>       log.error('Failed to stop runner within deadline.')
>     try:
>       deadline(self._chained_checker.stop, timeout=self.STOP_TIMEOUT)
>     except Timeout:
>       log.error('Failed to stop all checkers within deadline.')
>     # If the runner was alive when _shutdown was called, defer to the 
> status_result,
>     # otherwise the runner's terminal state is the preferred state.
>     exit_status = runner_status or status_result
>     self.send_update(
>         self._driver,
>         self._task_id,
>         exit_status.status,
>         status_result.reason)
>     self.terminated.set()
>     defer(self._driver.stop, delay=self.PERSISTENCE_WAIT)
> {noformat}
> However if runner.stop fails with a Timeout exception, the spawned 
> AnonymousThread is not daemonized and causes the executor to fail to exit.  
> This means that the cgroup will not be torn down and if the runner.stop 
> actually failed, the process can stay alive even if TASK_KILLED was delivered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to