Hi all,

Incremental checkpointing can help a lot in improving the efficiency of
fault tolerance and recovery in Flink. I wrote an initial design of
incremental checkpointing in Flink, and am looking forwards for your
comments.


https://docs.google.com/document/d/1VvvPp09gGdVb9D2wHx6NX99yQK0jSUMWHQPrQ_mn520/edit?usp=sharing


Some more issues, I think, are needed to be discussed in the introduction
of incremental checkpointing.


One is the implementation of savepoints. Savepoints are supposed to be full
and independent of backend implementation. Currently, the implementation of
Savepoints and Checkpoints are identical in backends. With the introduction
of incremental checkpointing, I think backends should take different
snapshots for them.


Regards,

Xiaogang

Reply via email to