[GitHub] flink pull request #5582: [FLINK-8790][State] Improve performance for recove...

sihuazhou Fri, 01 Jun 2018 02:20:31 -0700

Github user sihuazhou commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5582#discussion_r192341305
  
    --- Diff: 
flink-state-backends/flink-statebackend-rocksdb/src/test/java/org/apache/flink/contrib/streaming/state/RocksDBStateBackendTest.java
 ---
    @@ -547,4 +549,30 @@ public boolean accept(File file, String s) {
                        return true;
                }
        }
    +
    +   private static class TestRocksDBStateBackend extends 
RocksDBStateBackend {
    +
    +           public TestRocksDBStateBackend(AbstractStateBackend 
checkpointStreamBackend, boolean enableIncrementalCheckpointing) {
    +                   super(checkpointStreamBackend, 
enableIncrementalCheckpointing);
    +           }
    +
    +           @Override
    +           public <K> AbstractKeyedStateBackend<K> createKeyedStateBackend(
    +                   Environment env,
    +                   JobID jobID,
    +                   String operatorIdentifier,
    +                   TypeSerializer<K> keySerializer,
    +                   int numberOfKeyGroups,
    +                   KeyGroupRange keyGroupRange,
    +                   TaskKvStateRegistry kvStateRegistry) throws IOException 
{
    +
    +                   AbstractKeyedStateBackend<K> keyedStateBackend = 
super.createKeyedStateBackend(
    +                           env, jobID, operatorIdentifier, keySerializer, 
numberOfKeyGroups, keyGroupRange, kvStateRegistry);
    +
    +                   // We ignore the range deletions on production, but 
when we are running the tests we shouldn't ignore it.
    --- End diff --
    
    Sorry...I think I may didn't understand "I am wondering if we should not 
prefer to apply normal deletes over range delete" properly, is that mean "I am 
wondering if we should prefer to apply normal deletes over range delete". As 
far as I know the keys all be gone only when compaction occur, for 
`deleteRange()` it only write a special record in db, looks like `Deleted 
Range(beginKey, endKey]`, it won't remove any records from the db indeed.
    
    And yes, concerning to the negative side-effects of the `deleteRange()` I 
also still have the same concerns, even thought rescaling from checkpoint is 
still an experimental feature. I think a more safer way to improve the 
performance of recover from incremental checkpoint is we don't clip it, and 
only choose the instance to be the initial db when its key-group range is a 
subset of the target key-group range. What do you think?

---

[GitHub] flink pull request #5582: [FLINK-8790][State] Improve performance for recove...

Reply via email to