[ 
https://issues.apache.org/jira/browse/FLINK-26029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin reassigned FLINK-26029:
--------------------------------

    Assignee: Yunfeng Zhou  (was: Dong Lin)

> Generalize the checkpoint protocol of OperatorCoordinator.
> ----------------------------------------------------------
>
>                 Key: FLINK-26029
>                 URL: https://issues.apache.org/jira/browse/FLINK-26029
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.14.3
>            Reporter: Jiangjie Qin
>            Assignee: Yunfeng Zhou
>            Priority: Major
>              Labels: extensibility, stale-assigned
>             Fix For: 1.16.0
>
>
> Currently the JM opens all the event valves from the OperatorCoordinator to 
> the subtasks after the checkpoint barriers are sent to the Source subtasks. 
> While this works for the Source Operators, it unnecessarily limits general 
> usage of the OperatorCoordinator for other operators.
> To generalize the protocol, we can change the JM to open the event valve of 
> the subtasks that have finished the local checkpoint. So the protocol would 
> become following:
>  # Let the OC finish processing all the incoming OperatorEvents before the 
> snapshot.
>  # Wait until all the outgoing OperatorEvents before the snapshot are sent 
> and acked.
>  # Shut the event valve so no outgoing events can be sent to the subtasks.
>  # Send checkpoint barriers to the Source operators.
>  # Open the corresponding event valve of a subtask when the 
> AcknowledgeCheckpoint messages from that subtask is received. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to