[ https://issues.apache.org/jira/browse/AURORA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099070#comment-15099070 ]
Zameer Manji commented on AURORA-1582: -------------------------------------- After chatting with [~jsirois], it seems a better solution might be to just create a metric for uncaught exceptions that are logged and modify the task pruner runnable to trigger a shutdown on an unhandled RuntimeException. > Task History Pruning attempts can fail silently > ----------------------------------------------- > > Key: AURORA-1582 > URL: https://issues.apache.org/jira/browse/AURORA-1582 > Project: Aurora > Issue Type: Bug > Reporter: Zameer Manji > > As discovered in AURORA-1580, task history pruning attempts can fail and if > they do fail, they fail silently. The root cause seems to be that > AsyncModule's {{AsyncProcessor}} threads just log the unhandled exception if > it exists: > {noformat} > private static void evaluateResult(Runnable runnable, Throwable throwable, > Logger logger) { > // See java.util.concurrent.ThreadPoolExecutor#afterExecute(Runnable, > Throwable) > // for more details and an implementation example. > if (throwable == null) { > if (runnable instanceof Future) { > try { > Future<?> future = (Future<?>) runnable; > if (future.isDone()) { > future.get(); > } > } catch (InterruptedException ie) { > Thread.currentThread().interrupt(); > } catch (ExecutionException ee) { > logger.error(ee.toString(), ee); > } > } > } else { > logger.error(throwable.toString(), throwable); > } > } > {noformat} > I think instead of silently failing if work on these threads fail, we should > shut down the scheduler, much like how if the preemptor or other guava > service fails we shut down the scheduler. This way the scheduler does not > enter an undefined state and operators are informed of the abnormal behaviour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)