Till Rohrmann created FLINK-24038: ------------------------------------- Summary: DispatcherResourceManagerComponent fails to deregister application if no leading ResourceManager Key: FLINK-24038 URL: https://issues.apache.org/jira/browse/FLINK-24038 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.14.0 Reporter: Till Rohrmann Fix For: 1.14.0
With FLINK-21667 we introduced a change that can cause the {{DispatcherResourceManagerComponent}} to fail when trying to stop the application. The problem is that the {{DispatcherResourceManagerComponent}} needs a leading {{ResourceManager}} to successfully execute the stop/deregister application call. If this is not the case, then it will fail fatally. In the case of multiple standby JobManager processes it can happen that the leading {{ResourceManager}} runs somewhere else. I do see two possible solutions: 1. Run the leader election process for the whole JobManager process 2. Move the registration/deregistration of the application out of the {{ResourceManager}} so that it can be executed w/o a leader -- This message was sent by Atlassian Jira (v8.3.4#803005)