[ https://issues.apache.org/jira/browse/AURORA-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joshua Cohen reassigned AURORA-1614: ------------------------------------ Assignee: Joshua Cohen > Failed sandbox initialization can cause tasks to go LOST > -------------------------------------------------------- > > Key: AURORA-1614 > URL: https://issues.apache.org/jira/browse/AURORA-1614 > Project: Aurora > Issue Type: Bug > Components: Executor > Reporter: Joshua Cohen > Assignee: Joshua Cohen > Priority: Minor > > When we initialize the sandbox, we only catch Sandbox specific error types, > meaning that if an unexpected error is raised, the executor just hangs until > the timeout is exceeded, at which point the task goes lost. > We should instead broadly catch exceptions raised during sandbox > initialization and quickly fail tasks. > Additionally, the {{DockerDirectorySandbox}} was not properly catching errors > raised when creating/symlinking which led to the above problem in the event > of a misconfiguration. In practice this issue shouldn't have occurred in > normal usage, but it made development slow until I tracked down what was > causing the tasks to just hang. -- This message was sent by Atlassian JIRA (v6.3.4#6332)