[ 
https://issues.apache.org/jira/browse/KAFKA-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111858#comment-17111858
 ] 

GEORGE LI commented on KAFKA-8638:
----------------------------------

[~hai_lin]

some of the recent activities about KIP-491 is in KAFKA-4084, where I made a 
patch for version 2.4 (and 1.1)  with an installation guide. 



> Preferred Leader Blacklist (deprioritized list)
> -----------------------------------------------
>
>                 Key: KAFKA-8638
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8638
>             Project: Kafka
>          Issue Type: Improvement
>          Components: config, controller, core
>    Affects Versions: 1.1.1, 2.3.0, 2.2.1
>            Reporter: GEORGE LI
>            Assignee: GEORGE LI
>            Priority: Major
>
> Currently, the kafka preferred leader election will pick the broker_id in the 
> topic/partition replica assignments in a priority order when the broker is in 
> ISR. The preferred leader is the broker id in the first position of replica. 
> There are use-cases that, even the first broker in the replica assignment is 
> in ISR, there is a need for it to be moved to the end of ordering (lowest 
> priority) when deciding leadership during preferred leader election.
> Let’s use topic/partition replica (1,2,3) as an example. 1 is the preferred 
> leader. When preferred leadership is run, it will pick 1 as the leader if 
> it's ISR, if 1 is not online and in ISR, then pick 2, if 2 is not in ISR, 
> then pick 3 as the leader. There are use cases that, even 1 is in ISR, we 
> would like it to be moved to the end of ordering (lowest priority) when 
> deciding leadership during preferred leader election. Below is a list of use 
> cases:
>  * (If broker_id 1 is a swapped failed host and brought up with last segments 
> or latest offset without historical data (There is another effort on this), 
> it's better for it to not serve leadership till it's caught-up.
>  * The cross-data center cluster has AWS instances which have less computing 
> power than the on-prem bare metal machines. We could put the AWS broker_ids 
> in Preferred Leader Blacklist, so on-prem brokers can be elected leaders, 
> without changing the reassignments ordering of the replicas.
>  * If the broker_id 1 is constantly losing leadership after some time: 
> "Flapping". we would want to exclude 1 to be a leader unless all other 
> brokers of this topic/partition are offline. The “Flapping” effect was seen 
> in the past when 2 or more brokers were bad, when they lost leadership 
> constantly/quickly, the sets of partition replicas they belong to will see 
> leadership constantly changing. The ultimate solution is to swap these bad 
> hosts. But for quick mitigation, we can also put the bad hosts in the 
> Preferred Leader Blacklist to move the priority of its being elected as 
> leaders to the lowest.
>  * If the controller is busy serving an extra load of metadata requests and 
> other tasks. we would like to put the controller's leaders to other brokers 
> to lower its CPU load. currently bouncing to lose leadership would not work 
> for Controller, because after the bounce, the controller fails over to 
> another broker.
>  * Avoid bouncing broker in order to lose its leadership: it would be good if 
> we have a way to specify which broker should be excluded from serving 
> traffic/leadership (without changing the replica assignment ordering by 
> reassignments, even though that's quick), and run preferred leader election. 
> A bouncing broker will cause temporary URP, and sometimes other issues. Also 
> a bouncing of broker (e.g. broker_id 1) can temporarily lose all its 
> leadership, but if another broker (e.g. broker_id 2) fails or gets bounced, 
> some of its leaderships will likely failover to broker_id 1 on a replica with 
> 3 brokers. If broker_id 1 is in the blacklist, then in such a scenario even 
> broker_id 2 offline, the 3rd broker can take leadership.
> The current work-around of the above is to change the topic/partition's 
> replica reassignments to move the broker_id 1 from the first position to the 
> last position and run preferred leader election. e.g. (1, 2, 3) => (2, 3, 1). 
> This changes the replica reassignments, and we need to keep track of the 
> original one and restore if things change (e.g. controller fails over to 
> another broker, the swapped empty broker caught up). That’s a rather tedious 
> task.
> KIP is located at 
> [KIP-491|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to