[ 
https://issues.apache.org/jira/browse/AURORA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099070#comment-15099070
 ] 

Zameer Manji commented on AURORA-1582:
--------------------------------------

After chatting with [~jsirois], it seems a better solution might be to just 
create a metric for uncaught exceptions that are logged and modify the task 
pruner runnable to trigger a shutdown on an unhandled RuntimeException.

> Task History Pruning attempts can fail silently
> -----------------------------------------------
>
>                 Key: AURORA-1582
>                 URL: https://issues.apache.org/jira/browse/AURORA-1582
>             Project: Aurora
>          Issue Type: Bug
>            Reporter: Zameer Manji
>
> As discovered in AURORA-1580, task history pruning attempts can fail and if 
> they do fail, they fail silently. The root cause seems to be that 
> AsyncModule's {{AsyncProcessor}} threads just log the unhandled exception if 
> it exists:
> {noformat}
>   private static void evaluateResult(Runnable runnable, Throwable throwable, 
> Logger logger) {
>     // See java.util.concurrent.ThreadPoolExecutor#afterExecute(Runnable, 
> Throwable)
>     // for more details and an implementation example.
>     if (throwable == null) {
>       if (runnable instanceof Future) {
>         try {
>           Future<?> future = (Future<?>) runnable;
>           if (future.isDone()) {
>             future.get();
>           }
>         } catch (InterruptedException ie) {
>           Thread.currentThread().interrupt();
>         } catch (ExecutionException ee) {
>           logger.error(ee.toString(), ee);
>         }
>       }
>     } else {
>       logger.error(throwable.toString(), throwable);
>     }
>   }
> {noformat}
> I think instead of silently failing if work on these threads fail, we should 
> shut down the scheduler, much like how if the preemptor or other guava 
> service fails we shut down the scheduler. This way the scheduler does not 
> enter an undefined state and operators are informed of the abnormal behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to