Stefan Richter created FLINK-6485:
-------------------------------------

             Summary: Use buffering to avoid frequent memtable flushes for 
short intervals in RockdDB incremental checkpoints
                 Key: FLINK-6485
                 URL: https://issues.apache.org/jira/browse/FLINK-6485
             Project: Flink
          Issue Type: Improvement
          Components: State Backends, Checkpointing
            Reporter: Stefan Richter


The current implementation of incremental checkpoitns in RocksDB needs to flush 
the memtable to disk prior to a checkpoint and this will generate a SST file.

What is required for fast checkpoint intervals is an alternative mechanism to 
quickly determine a delta from the previous incremental checkpoint to avoid 
this frequent flushing. This could be implemented through custom buffering 
inside the backend, e.g. a changelog buffer that is maintain up to a certain 
size. 

The buffer's content becomes part of the private state in the incremental 
snapshot and the buffer is dropped i) after each checkpoint or ii) after 
exceeding a certain size that justifies flushing and writing a new SST file.

This mechanism should not be blocking, which we can achieve in the following 
way:

1) We have a clear upper limit to the buffer size (e.g. 64MB), once the limit 
of diffs is reached, we can drop the buffer because we can assume enough work 
was done to justify a new SST file

2) We write the buffer to a local FS, so we can expect this to be reasonable 
fast and that it will not suffer from the kind of blocking that we have in DFS. 
I mean technically, also flushing the SST file can block. Then, in the async 
part, we can transfer the locally written buffer file to DFS.

There might be other mechanisms in RocksDB that we could exploit for this, such 
as the write ahead log, but this could be already be a good solution.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to