[ https://issues.apache.org/jira/browse/FLINK-23430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392260#comment-17392260 ]
Dawid Wysakowicz commented on FLINK-23430: ------------------------------------------ I agree it is kind of an optimization. We can just keep snapshotting state of such coordinators. > Also I would ask a question, why {{OperatorCoordinator}}s are still running > if all of it's operators have finished/closed? This is a good question, that I don't know the answer. I guess it was not implemented so far. > Also I would ask another question, during recovery, do/should we even start > an OperatorCoordinator if all of it's operator have already finished long > time ago? I'd treat that as an optimization as well. Right now, we also start subtasks which finished long time ago, but we immediately go to the closing/finishing phase for them. > Do not take snapshot for operator coordinators which all tasks finished > ----------------------------------------------------------------------- > > Key: FLINK-23430 > URL: https://issues.apache.org/jira/browse/FLINK-23430 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Checkpointing > Reporter: Dawid Wysakowicz > Assignee: Dawid Wysakowicz > Priority: Major > Fix For: 1.14.0 > > > Currently we trigger checkpoints for all operator coordinators irrespective > if their corresponding tasks finished or not. This leads e.g. to a > precondition in > {{org.apache.flink.runtime.checkpoint.PendingCheckpoint#fulfillFullyFinishedOperatorStates}} > failing -- This message was sent by Atlassian Jira (v8.3.4#803005)