[ https://issues.apache.org/jira/browse/FLINK-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438139#comment-17438139 ]
Aitozi commented on FLINK-24713: -------------------------------- Thanks for your reply [~wangyang0918] [~trohrmann] . I will post a solution as your suggestion. > Postpone resourceManager serving after the recovery phase has finished > ---------------------------------------------------------------------- > > Key: FLINK-24713 > URL: https://issues.apache.org/jira/browse/FLINK-24713 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination > Affects Versions: 1.14.0 > Reporter: Aitozi > Priority: Major > > When ResourceManager started, JobManger will connect to the ResourceManager, > this means the ResourceManager will begin to try serve the resource requests > from SlotManager. > If ResourceManager failover, although it will try to recover the pod / > container from previous attempt, But new resource requirements may happen > before the old taskManger register to slotManager. > In this case, it may double the required taskManager when jobManager > failover. We may need a mechanism to postpone resourceManager serving after > the recovery phase has finished -- This message was sent by Atlassian Jira (v8.3.4#803005)