Till Rohrmann created FLINK-10505:
-------------------------------------
Summary: Treat fail signal as scheduling event
Key: FLINK-10505
URL: https://issues.apache.org/jira/browse/FLINK-10505
Project: Flink
Issue Type: Sub-task
Components: Distributed Coordination
Affects Versions: 1.7.0
Reporter: Till Rohrmann
Fix For: 1.7.0
Instead of simply calling into the {{RestartStrategy}} which restarts the
existing {{ExecutionGraph}} with the same parallelism, the
{{ExecutionGraphDriver}} should treat a recovery similar to the initial
scheduling operation. First, one needs to decide on the new parallelism of the
{{ExecutionGraph}} (scale up/scale down) wrt to the available set of resources.
Only if the minimum configuration is fulfilled, the potentially rescaled
{{ExecutionGraph}} will be restarted.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)