[ https://issues.apache.org/jira/browse/FLINK-27155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526281#comment-17526281 ]
Feifan Wang edited comment on FLINK-27155 at 4/22/22 9:08 AM: -------------------------------------------------------------- Thanks for your reply [~yunta] , I think we can serialize all accesses to the same file in a separate ticket once this ticket resolved. And, I don't understand what the following sentence means, downloading and applying changelog files are all in RUNNING state in my knowledge, what time do you mean we could discard local cache file ? {quote}For the time to cleanup, once all tasks starts to be RUNNING on the taskmanager, I think we could safely discard them then. {quote} was (Author: feifan wang): Thanks for your reply [~yunta] , I think we can serialize all accesses to the same file in a separate ticket once this ticket resolved. And, I don't understand what the following sentence means, downloading and applying changelog files are all in RUNNING state in my knowledge, what time do you mean we could discard local cache file ? {quote}For the time to cleanup, once all tasks starts to be RUNNING on the taskmanager, I think we could safely discard them then. {quote} > Reduce multiple reads to the same Changelog file in the same taskmanager > during restore > --------------------------------------------------------------------------------------- > > Key: FLINK-27155 > URL: https://issues.apache.org/jira/browse/FLINK-27155 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends > Reporter: Feifan Wang > Assignee: Feifan Wang > Priority: Major > Fix For: 1.16.0 > > > h3. Background > In the current implementation, State changes of different operators in the > same taskmanager may be written to the same changelog file, which effectively > reduces the number of files and requests to DFS. > But on the other hand, the current implementation also reads the same > changelog file multiple times on recovery. More specifically, the number of > times the same changelog file is accessed is related to the number of > ChangeSets contained in it. And since each read needs to skip the preceding > bytes, this network traffic is also wasted. > The result is a lot of unnecessary request to DFS when there are multiple > slots and keyed state in the same taskmanager. > h3. Proposal > We can reduce multiple reads to the same changelog file in the same > taskmanager during restore. > One possible approach is to read the changelog file all at once and cache it > in memory or local file for a period of time when reading the changelog file. > I think this could be a subtask of [v2 FLIP-158: Generalized incremental > checkpoints|https://issues.apache.org/jira/browse/FLINK-25842] . > Hi [~ym] , [~roman] how do you think about ? -- This message was sent by Atlassian Jira (v8.20.7#820007)