[
https://issues.apache.org/jira/browse/FLINK-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15533194#comment-15533194
]
ASF GitHub Bot commented on FLINK-4711:
---------------------------------------
Github user tillrohrmann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2569#discussion_r81173599
--- Diff:
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
---
@@ -873,8 +873,7 @@ class JobManager(
}
sender ! decorateMessage(
- PartitionState(
- taskExecutionId,
+ new org.apache.flink.runtime.io.network.PartitionState(
--- End diff --
Yes that's true. Will fix it.
> TaskManager can crash due to failing onPartitionStateUpdate call
> ----------------------------------------------------------------
>
> Key: FLINK-4711
> URL: https://issues.apache.org/jira/browse/FLINK-4711
> Project: Flink
> Issue Type: Bug
> Components: Distributed Coordination
> Affects Versions: 1.2.0
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Fix For: 1.2.0
>
>
> The {{TaskManager}} can crash because it calls
> {{Task.onPartitionStateUpdate}} when it receives a {{PartitionState}}
> message. The {{onPartitionStateUpdate}} method can throw an {{IOException}}
> or {{InterruptedException}} which are not handled on the {{TaskManager}}
> level.
> Another problem is that the initial partition state request is triggered
> within the {{SingleInputGate}}. The request causes the {{JobManager}} to send
> a {{PartitionState}} message to the {{TaskManager}} which forwards it to the
> {{Task}}. If the at any of these points a message gets lost, then it is not
> retried and the partition state remains unknown.
> In order to handle the exceptions, to make the data flow clearer and to add
> automatic retries, I propose to let the {{Task}} send the partition state
> check requests. Furthermore, the {{JobManager}} should directly answer to the
> {{Task}} by replying to an ask operation. That way the message does not have
> to be routed through the {{TaskManager}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)