[
https://issues.apache.org/jira/browse/FLINK-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859733#comment-15859733
]
ASF GitHub Bot commented on FLINK-5759:
---------------------------------------
GitHub user StephanEwen opened a pull request:
https://github.com/apache/flink/pull/3290
[FLINK-5759] [jobmanager] Set UncaughtExceptionHandlers for JobManager's
Future and I/O thread pools
Currently, the thread pools of the `JobManager` do not have any
`UncaughtExceptionHandler`.
While uncaught exceptions are rare (Flink handles exceptions aggressively
in most places), when exceptions slip through in these threads (which execute
future responses and delayed actions), the `JobManager` may be in an
inconsistent state and not function properly any more.
This pull request adds a handler that results in a process kill in the case
of uncaught exceptions. Letting the JobManager be restarted by the respective
cluster framework is the only guaranteed way to be safe.
This also unifies the `ExecutorThreadFactory` and `NamedThreadFactory`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/StephanEwen/incubator-flink uncaught_handlers
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3290.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3290
----
commit 3602631353dbdf230044db7fba1890600e648101
Author: Stephan Ewen <[email protected]>
Date: 2017-02-09T13:04:17Z
[FLINK-5759] [jobmanager] Set UncaughtExceptionHandlers for JobManager's
Future and I/O thread pools
----
> Set an UncaughtExceptionHandler for all Thread Pools in JobManager
> ------------------------------------------------------------------
>
> Key: FLINK-5759
> URL: https://issues.apache.org/jira/browse/FLINK-5759
> Project: Flink
> Issue Type: Bug
> Components: JobManager
> Affects Versions: 1.2.0
> Reporter: Stephan Ewen
> Assignee: Stephan Ewen
> Fix For: 1.3.0
>
>
> Currently, the thread pools of the {{JobManager}} do not have any
> {{UncaughtExceptionHandler}}.
> While uncaught exceptions are rare (Flink handles exceptions aggressively in
> most places), when exceptions slip through in these threads (which execute
> future responses and delayed actions), the JobManager may be in an
> inconsistent state and not function properly any more.
> We should add a handler that results in a process kill in the case of
> uncaught exceptions. Letting the JobManager be restarted by the respective
> cluster framework is the only guaranteed way to be safe.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)