[jira] [Commented] (FLINK-5759) Set an UncaughtExceptionHandler for all Thread Pools in JobManager

ASF GitHub Bot (JIRA) Fri, 10 Feb 2017 02:02:21 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861024#comment-15861024
 ]


ASF GitHub Bot commented on FLINK-5759:
---------------------------------------

Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3290#discussion_r100502133
  
    --- Diff: 
flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/MesosApplicationMasterRunner.java
 ---
    @@ -216,11 +220,11 @@ protected int runPrivileged(Configuration config, 
Configuration dynamicPropertie
     
                        futureExecutor = Executors.newScheduledThreadPool(
                                numberProcessors,
    -                           new 
NamedThreadFactory("mesos-jobmanager-future-", "-thread-"));
    +                           new 
ExecutorThreadFactory("mesos-jobmanager-future"));
    --- End diff --
    
    Just wondering if 'akkaExecutor' and 'mesos-jobmanager-akka' (or 
'coordinationFutureExecutor' if we want to be more general than 'akka') would 
carry more information for people not familiar with the code. As far as I can 
see, this pool is only used by Akka, whereas the name could imply that it is 
somehow used for general futures or even async user code.


> Set an UncaughtExceptionHandler for all Thread Pools in JobManager
> ------------------------------------------------------------------
>
>                 Key: FLINK-5759
>                 URL: https://issues.apache.org/jira/browse/FLINK-5759
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager
>    Affects Versions: 1.2.0
>            Reporter: Stephan Ewen
>            Assignee: Stephan Ewen
>             Fix For: 1.3.0
>
>
> Currently, the thread pools of the {{JobManager}} do not have any 
> {{UncaughtExceptionHandler}}.
> While uncaught exceptions are rare (Flink handles exceptions aggressively in 
> most places), when exceptions slip through in these threads (which execute 
> future responses and delayed actions), the JobManager may be in an 
> inconsistent state and not function properly any more.
> We should add a handler that results in a process kill in the case of 
> uncaught exceptions. Letting the JobManager be restarted by the respective 
> cluster framework is the only guaranteed way to be safe.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5759) Set an UncaughtExceptionHandler for all Thread Pools in JobManager

Reply via email to