[ 
https://issues.apache.org/jira/browse/FLINK-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992469#comment-15992469
 ] 

ASF GitHub Bot commented on FLINK-6364:
---------------------------------------

Github user shixiaogang commented on the issue:

    https://github.com/apache/flink/pull/3801
  
    Hi @gyfora I am very happy to hear from you. The following are the answers 
to your questions. Kindly let me know if you have any idea of them.
    
    1. The incremental checkpoints supports rescaling. It's true that the 
implementation checkpoints files directly for multiple key groups together. But 
in the cases where the degree of parallelism changes, the files will be passed 
to all the state backends whose key groups are in the files. Then the backends 
will iterate over all the key-value pairs in the files and pick up those kv 
pairs that belong to them.
    
    2.  In the cases we restore from a full snapshot (which is formatted as 
key-value pairs), the next incremental checkpoint will contain all the files. 
It may seem a little bit inefficient because i intend to make each checkpoint 
self-contained. Given that full snapshots and incremental snapshots are in 
different formats, we have to take a "full" incremental snapshot as the base 
for following checkpoints.
    
    3. That is a very good question. It will be flexible that users can choose 
the scheme of checkpoints (say one full checkpoint after n incremental 
checkpoints).  But i think making every checkpoint incremental is acceptable 
because incremental checkpoints are more  efficient in most cases. Those 
backends which do not support incremental checkpointing can still take full 
snapshotting.


> Implement incremental checkpointing in RocksDBStateBackend
> ----------------------------------------------------------
>
>                 Key: FLINK-6364
>                 URL: https://issues.apache.org/jira/browse/FLINK-6364
>             Project: Flink
>          Issue Type: Sub-task
>          Components: State Backends, Checkpointing
>            Reporter: Xiaogang Shi
>            Assignee: Xiaogang Shi
>
> {{RocksDBStateBackend}} is well suited for incremental checkpointing because 
> RocksDB is base on LSM trees,  which record updates in new sst files and all 
> sst files are immutable. By only materializing those new sst files, we can 
> significantly improve the performance of checkpointing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to