Stephan Ewen created FLINK-1953:
-----------------------------------

             Summary: Rework Checkpoint Coordinator
                 Key: FLINK-1953
                 URL: https://issues.apache.org/jira/browse/FLINK-1953
             Project: Flink
          Issue Type: Bug
          Components: Streaming
    Affects Versions: 0.9
            Reporter: Stephan Ewen
            Assignee: Stephan Ewen
             Fix For: 0.9


The checkpoint coordinator currently contains no tests and is vulnerable to a 
variety of situations. In particular, I propose to add:

 - Better configurability which tasks receive the trigger checkpoint messages, 
which tasks need to acknowledge the checkpoint, and which tasks need to receive 
confirmation messages.

 - checkpoint timeouts, such that incomplete checkpoints are guaranteed to be 
cleaned up after a while, regardless of successful checkpoints

 - better sanity checking of messages and fields, to properly handle/ignore 
messages for old/expired checkpoints, or invalidly routed messages

 - Better handling of checkpoint attempts at points where the execution has 
just failed is is currently being canceled.

 - Add a good set of tests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to