[
https://issues.apache.org/jira/browse/FLINK-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092932#comment-15092932
]
Fabian Hueske commented on FLINK-1581:
--------------------------------------
[~till.rohrmann], is this issue still valid?
> Configure DeathWatch parameters properly
> ----------------------------------------
>
> Key: FLINK-1581
> URL: https://issues.apache.org/jira/browse/FLINK-1581
> Project: Flink
> Issue Type: Bug
> Reporter: Till Rohrmann
>
> We are using Akka's DeathWath mechanism to detect failed components. However,
> the interval until an {{Instance}} is marked dead is currently very long.
> Especially, in conjunction with the job restarting mechanism we should devise
> a mechanism which either quickly detects dead {{Instance}}s or set the
> interval, pause and threshold values such that the detection does not take
> longer than the Akka ask timeout value. Otherwise, all retries might be
> consumed before an {{Instance}} is recognized being dead.
> Further investigation of the correct failure behavior is necessary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)