Stephan Ewen created FLINK-1953: ----------------------------------- Summary: Rework Checkpoint Coordinator Key: FLINK-1953 URL: https://issues.apache.org/jira/browse/FLINK-1953 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.9
The checkpoint coordinator currently contains no tests and is vulnerable to a variety of situations. In particular, I propose to add: - Better configurability which tasks receive the trigger checkpoint messages, which tasks need to acknowledge the checkpoint, and which tasks need to receive confirmation messages. - checkpoint timeouts, such that incomplete checkpoints are guaranteed to be cleaned up after a while, regardless of successful checkpoints - better sanity checking of messages and fields, to properly handle/ignore messages for old/expired checkpoints, or invalidly routed messages - Better handling of checkpoint attempts at points where the execution has just failed is is currently being canceled. - Add a good set of tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)