[ 
https://issues.apache.org/jira/browse/AURORA-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua Cohen reassigned AURORA-1614:
------------------------------------

    Assignee: Joshua Cohen

> Failed sandbox initialization can cause tasks to go LOST
> --------------------------------------------------------
>
>                 Key: AURORA-1614
>                 URL: https://issues.apache.org/jira/browse/AURORA-1614
>             Project: Aurora
>          Issue Type: Bug
>          Components: Executor
>            Reporter: Joshua Cohen
>            Assignee: Joshua Cohen
>            Priority: Minor
>
> When we initialize the sandbox, we only catch Sandbox specific error types, 
> meaning that if an unexpected error is raised, the executor just hangs until 
> the timeout is exceeded, at which point the task goes lost.
> We should instead broadly catch exceptions raised during sandbox 
> initialization and quickly fail tasks.
> Additionally, the {{DockerDirectorySandbox}} was not properly catching errors 
> raised when creating/symlinking which led to the above problem in the event 
> of a misconfiguration. In practice this issue shouldn't have occurred in 
> normal usage, but it made development slow until I tracked down what was 
> causing the tasks to just hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to