[ https://issues.apache.org/jira/browse/FLINK-27155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526973#comment-17526973 ]
Yun Tang commented on FLINK-27155: ---------------------------------- > I think Yun Tang means waiting for all task of a TM to switch from > INITIALIZING to RUNNING. > I think it could be problematic, because the set of tasks is dynamic. >From my understanding, the task manager could be able to know that all tasks >has changed to RUNNING status. > I think task in RUNNING state not mean we can clean up the cache file, > because changelog download and applying to delegated backend is in RUNNING > state. Applying the changelogs happens during changelog restore phase, which is in the task INITIALIZATING state (not the RUNNING state), that's why I think we can safely discard downloaded files as the phase has completed. [~Feifan Wang] do you think you can provide a design doc to include all discussions here? > Reduce multiple reads to the same Changelog file in the same taskmanager > during restore > --------------------------------------------------------------------------------------- > > Key: FLINK-27155 > URL: https://issues.apache.org/jira/browse/FLINK-27155 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends > Reporter: Feifan Wang > Assignee: Feifan Wang > Priority: Major > Fix For: 1.16.0 > > > h3. Background > In the current implementation, State changes of different operators in the > same taskmanager may be written to the same changelog file, which effectively > reduces the number of files and requests to DFS. > But on the other hand, the current implementation also reads the same > changelog file multiple times on recovery. More specifically, the number of > times the same changelog file is accessed is related to the number of > ChangeSets contained in it. And since each read needs to skip the preceding > bytes, this network traffic is also wasted. > The result is a lot of unnecessary request to DFS when there are multiple > slots and keyed state in the same taskmanager. > h3. Proposal > We can reduce multiple reads to the same changelog file in the same > taskmanager during restore. > One possible approach is to read the changelog file all at once and cache it > in memory or local file for a period of time when reading the changelog file. > I think this could be a subtask of [v2 FLIP-158: Generalized incremental > checkpoints|https://issues.apache.org/jira/browse/FLINK-25842] . > Hi [~ym] , [~roman] how do you think about ? -- This message was sent by Atlassian Jira (v8.20.7#820007)