[GitHub] [flink] rkhachatryan commented on a change in pull request #15322: [FLINK-21353][state] Add DFS-based StateChangelog (TM-owned state)

GitBox Tue, 20 Jul 2021 03:02:13 -0700


rkhachatryan commented on a change in pull request #15322:
URL: https://github.com/apache/flink/pull/15322#discussion_r672604163




##########
File path: 
flink-dstl/flink-dstl-dfs/src/main/java/org/apache/flink/changelog/fs/FsStateChangelogOptions.java
##########
@@ -57,4 +87,18 @@
                             "Number of threads to use to perform cleanup in 
case an upload is discarded "
                                     + "(and not cleaned up by JM). "
                                     + "If the cleanup doesn't keep up then 
task might be back-pressured.");
+
+    public static final ConfigOption<Integer> NUM_UPLOAD_THREADS =
+            ConfigOptions.key("dstl.dfs.upload.num-threads")
+                    .intType()
+                    .defaultValue(5)
+                    .withDescription("Number of threads to use for upload.");

Review comment:
       > This sounds like RocksDB issue 
   
   I guess it was a design decision to (easily) throttle the number of request 
to DFS (S3). This is yet another reason for a separate thread pool here.
   
   > I can see two reasons against using the same thread pool: ...2. having 
actions with different priority
   
   It's not only about the priority, but also about isolation, as I wrote 
above. If there is an issue with uploading (e.g. bug or request throttling) 
then it will consume all IO threads - regardless of the priority. This makes it 
harder to debug because it will manifest itself in many different ways, giving 
false signals.
   
   > I don't believe anyone will want to bother how much he should bump 
individual thread pools
   
   I'd say users shouldn't bother at all and the defaults should be enough. But 
once there **is** some issue it might be helpful to see what are threads in 
each pool doing - and maybe adjust accordingly.
   
   >  Besides I don't see this being an issue in this case either?
   
   I looked at the code, and it seems can block releasing partitions. Maybe 
there are or will be some other effects?
   
   > In that case it should be solved probably in a better way by still sharing 
thread pool, but with more sophisticated actions scheduler
   
   I think it's better only because of the other point - ease of configuration. 
Otherwise, separate thread pools are simpler in code and more robust.
   
   Besides that, async checkpointing phase (regardless of the backend) 
currently uses its own pool created in `StreamTask`, not the common IO pool.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] rkhachatryan commented on a change in pull request #15322: [FLINK-21353][state] Add DFS-based StateChangelog (TM-owned state)

Reply via email to