Piotr Nowojski created FLINK-13698:
--------------------------------------
Summary: Rework threading model of CheckpointCoordinator
Key: FLINK-13698
URL: https://issues.apache.org/jira/browse/FLINK-13698
Project: Flink
Issue Type: Bug
Components: Runtime / Checkpointing
Affects Versions: 1.10.0
Reporter: Piotr Nowojski
Currently {{CheckpointCoordinator}} and {{CheckpointFailureManager}} code is
executed by multiple different threads (mostly {{ioExecutor}}, but not only).
It's causing multiple concurrency issues, for example:
https://issues.apache.org/jira/browse/FLINK-13497
Proper fix would be to rethink threading model there. At first glance it
doesn't seem that this code should be multi threaded, except of parts doing the
actual IO operations, so it should be possible to run everything in one single
ExecutionGraph's thread and just run asynchronously necessary IO operations
with some feedback loop ("mailbox style").
I would strongly recommend fixing this issue before adding new features in the
\{{CheckpointCoordinator}} component.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)