azagrebin commented on a change in pull request #7351: [FLINK-11008][State Backends, Checkpointing]SpeedUp upload state files using multithread URL: https://github.com/apache/flink/pull/7351#discussion_r246340345
########## File path: flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDbStateDataTransfer.java ########## @@ -61,6 +69,88 @@ static void transferAllStateDataToDirectory( downloadDataForAllStateHandles(miscFiles, dest, restoringThreadNum, closeableRegistry); } + public static void uploadFilesToCheckpointFs( + @Nonnull Map<StateHandleID, Path> files, + int numberOfSnapshottingThreads, + CheckpointStreamFactory checkpointStreamFactory, + CloseableRegistry closeableRegistry, + Map<StateHandleID, StreamStateHandle> hanldes) throws Exception { Review comment: Not sure here, in general I think it is not obvious that function implicitly returns something (here `hanldes`, typo btw, should be `handles`) in its arguments. At least I would mention this fact in the function doc comment. `transferAllStateDataToDirectory` is now public and should also have a doc comment. On the other hand, if function returns explicitly `handles`, we have to reiterate them later where the result is used to add it to the final result map (e.g. `sstFiles.putAll(uploadFilesToCheckpointFs(..))`). Though, I would not expect the size of map to be performance critical for reiteration. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services