[GitHub] [flink] rkhachatryan commented on a change in pull request #15420: [FLINK-21356] Implement incremental checkpointing and recovery using state changelog

GitBox Thu, 24 Jun 2021 03:59:37 -0700


rkhachatryan commented on a change in pull request #15420:
URL: https://github.com/apache/flink/pull/15420#discussion_r657844591




##########
File path: 
flink-state-backends/flink-statebackend-changelog/src/main/java/org/apache/flink/state/changelog/ChangelogKeyedStateBackend.java
##########
@@ -369,7 +509,73 @@ public void notifyCheckpointAborted(long checkpointId) 
throws Exception {
     // Factory function interface
     private interface StateFactory {
         <K, N, SV, S extends State, IS extends S> IS create(
-                InternalKvState<K, N, SV> kvState, KvStateChangeLogger<SV, N> 
changeLogger)
+                InternalKvState<K, N, SV> kvState,
+                KvStateChangeLogger<SV, N> changeLogger,
+                InternalKeyContext<K> keyContext)
                 throws Exception;
     }
+
+    /**
+     * @param name state name
+     * @param type state type (the only supported type currently are: {@link
+     *     BackendStateType#KEY_VALUE key value}, {@link 
BackendStateType#PRIORITY_QUEUE priority
+     *     queue})
+     * @return an existing state, i.e. the one that was already created
+     * @throws NoSuchElementException if the state wasn't created
+     * @throws UnsupportedOperationException if state type is not supported
+     */
+    public ChangelogState getExistingState(String name, BackendStateType type)
+            throws NoSuchElementException, UnsupportedOperationException {
+        ChangelogState state;
+        switch (type) {
+            case KEY_VALUE:
+                state = (ChangelogState) keyValueStatesByName.get(name);

Review comment:
       To my understanding, the purpose of `keyValueStatesByName` is to have 
   1. exactly-once state table creation (in `createInternalState`)
   2. exactly-once metadata update and state migration (in 
`tryRegisterKvStateInformation`) 
   
   To implement (2), we'll probably introduce a special 
`changelogBackend.getOrCreateStateForRecovery()`. 
   And then both methods will provide (1) - with the help 
`keyValueStatesByName` map;
   and only in the current `getOrCreateState()` will provide (2) - by using 
some `delegatingSerializers` map for example.
   So `keyValueStatesByName` should be updated anyways.
   
   Regarding the approaches, lazy recovery (option 2) seems much more 
complicated. And because most snapshots I think will have the materialized part 
(which will be bigger), it's re-write will dominate changelog re-write. So I 
think it doesn't worth it.
   
   WDYT?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] rkhachatryan commented on a change in pull request #15420: [FLINK-21356] Implement incremental checkpointing and recovery using state changelog

Reply via email to