Niklas Semmler created FLINK-26522:
--------------------------------------
Summary: Refactoring code for multiple component leader election
Key: FLINK-26522
URL: https://issues.apache.org/jira/browse/FLINK-26522
Project: Flink
Issue Type: Improvement
Reporter: Niklas Semmler
The current implementation of the multiple component leader election faces a
number of issues. These issues mostly stem from an attempt to make the multiple
leader election process work just the same way as the single component leader
election.
An attempt at listing the issues follows:
* * *Naming* MultipleComponentLeaderElectionService appears by name similar to
the LeaderElectionService, but is in fact closer to the LeaderElectionDriver.
* *Similarity* The interfaces LeaderElectionService, LeaderElectionDriver and
MultipleComponentLeaderElectionDriver are very similar to each other.
* *Cyclic dependency* DefaultMultipleComponentLeaderElectionService holds a
reference to the ZooKeeperMultipleComponentLeaderElectionDriver
(MultipleComponentLeaderElectionDriver), which in turn holds a reference to the
DefaultMultipleComponentLeaderElectionService (LeaderLatchListener)
* *Unclear contract* With single component leader election drivers such as
ZooKeeperLeaderElectionDriver a call to the LeaderElectionService#stop from
JobMasterServiceLeadershipRunner#closeAsync implies giving up the leadership of
the JobMaster. With the multiple component leader election this is no longer
the case. The leadership is held until the HighAvailabilityServices shutdown.
This logic may be difficult to understand from the perspective of one of the
components (e.g., the Dispatcher)
* *Long call hierarchy*
DefaultLeaderElectionService->MultipleComponentLeaderElectionDriverAdapter->MultipleComponentLeaderElectionService->ZooKeeperMultipleComponentLeaderElectionDriver
* *Adapter as primary implementation* All non-testing non-multiple-component
leadership drivers are deprecated. The primary implementation of
LeaderElectionDriver is the adapter
MultipleComponentLeaderElectionDriverAdapter.
* *Possible redundancy* We currently have similar methods for the Dispatcher,
ResourceManager, JobMaster and WebMonitorEndpoint. (E.g., for granting
leadership.) As these methods are called at the same time due to the multiple
component leader election, it may make sense to combine this logic into a
single object.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)