Stefan Richter created FLINK-6485: ------------------------------------- Summary: Use buffering to avoid frequent memtable flushes for short intervals in RockdDB incremental checkpoints Key: FLINK-6485 URL: https://issues.apache.org/jira/browse/FLINK-6485 Project: Flink Issue Type: Improvement Components: State Backends, Checkpointing Reporter: Stefan Richter
The current implementation of incremental checkpoitns in RocksDB needs to flush the memtable to disk prior to a checkpoint and this will generate a SST file. What is required for fast checkpoint intervals is an alternative mechanism to quickly determine a delta from the previous incremental checkpoint to avoid this frequent flushing. This could be implemented through custom buffering inside the backend, e.g. a changelog buffer that is maintain up to a certain size. The buffer's content becomes part of the private state in the incremental snapshot and the buffer is dropped i) after each checkpoint or ii) after exceeding a certain size that justifies flushing and writing a new SST file. This mechanism should not be blocking, which we can achieve in the following way: 1) We have a clear upper limit to the buffer size (e.g. 64MB), once the limit of diffs is reached, we can drop the buffer because we can assume enough work was done to justify a new SST file 2) We write the buffer to a local FS, so we can expect this to be reasonable fast and that it will not suffer from the kind of blocking that we have in DFS. I mean technically, also flushing the SST file can block. Then, in the async part, we can transfer the locally written buffer file to DFS. There might be other mechanisms in RocksDB that we could exploit for this, such as the write ahead log, but this could be already be a good solution. -- This message was sent by Atlassian JIRA (v6.3.15#6346)