Ufuk Celebi created FLINK-3539:
----------------------------------
Summary: Sync running Execution and Task instances via heartbeats
Key: FLINK-3539
URL: https://issues.apache.org/jira/browse/FLINK-3539
Project: Flink
Issue Type: Improvement
Components: Distributed Runtime
Reporter: Ufuk Celebi
[~StephanEwen] pointed out that it is possible for the job manager and task
manager state to get out of sync. If for example a cancel message from the job
manager to the task manager is not delivered, the Execution will be failed at
the job manager, but the task will keep on running at the task manager.
A simple way to prevent such situations is the following:
- The task manager and job manager heartbeats add information about currently
running tasks/executions
- If a task manager reports a task, which is not a running Execution, that task
is cancelled
- If a job manager reports a running execution, which is not a running task,
the execution is failed
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)