[jira] [Commented] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

Hai Lin (Jira) Wed, 20 May 2020 11:18:10 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112508#comment-17112508
 ]


Hai Lin commented on KAFKA-4084:
--------------------------------

Thanks [~sql_consulting] point me to this ticket. [~junrao] Just want to hear a 
bit more about why KIP-491 is not in consideration based on your comment above.

{*quote*}

I am not sure if KIP-491 is necessarily the best approach to address this 
particular issue (in general, one probably shouldn't have any broker overloaded 
at any time). However, if there are other convincing use cases, we could 
consider it. 

{*quote*}

I feel it's a very useful feature for a lot of operation cases:

 

1. For high replica rate when broker boot up:

To me uneven size of partition on production is very command, with throttle 
some big partitions will take much longer to get fully replicated. Sometimes we 
just want a fully replica broker(like in 10 minutes without replica rather than 
hours). A long time with under replica broker in the system add more complicity 
for operation. For example, we need to be careful there is no other broker is 
offline during the replicating process.

 

2 Other situation like outlier broker

This happen pretty often if the cluster is big, most of the time it's not 
easy(at least time consuming) to replace broker even with EBS. We would like to 
disable a broker as leader but not take it offline. So the on-call have time to 
investigate the problem without terminate it right away. With KIP-491 we can 
add a lot of automation to the system that handle some network partition for a 
single broker without actually replace it.

 

3 Potential

If we can manipulate the view of leader in a cluster, we can do a bit more like 
introduce different leader for producer and consumer(consumer now can consumer 
from replica but I think there is still way we can control it). Then we can add 
priority to the client level and isolate client to talk only some of the 
brokers. 

 

This is more for KIP-491, we can surely move it back to the original ticket if 
we feel there is more discussion for this.

 

> automated leader rebalance causes replication downtime for clusters with too 
> many partitions
> --------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-4084
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4084
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>            Reporter: Tom Crayford
>            Priority: Major
>              Labels: reliability
>             Fix For: 1.1.0
>
>
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and 
> you have a cluster with many partitions, there is a severe amount of 
> replication downtime following a restart. This causes 
> `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes 
> leaders for *all* imbalanced partitions at once, instead of doing it 
> gradually. This effectively stops all replica fetchers in the cluster 
> (assuming there are enough imbalanced partitions), and restarts them. This 
> can take minutes on busy clusters, during which no replication is happening 
> and user data is at risk. Clients with {{acks=-1}} also see issues at this 
> time, because replication is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election 
> manually. There is also a broker configuration “auto.leader.rebalance.enable” 
> which you can set to have the broker automatically perform the PLE when 
> needed. DO NOT USE THIS OPTION. There are serious performance issues when 
> doing so, especially on larger clusters. It needs some development work that 
> has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high 
> partition counts causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of 
> partitions to do automated leader rebalancing for at once, and *stop* once 
> that number of leader rebalances are in flight, until they're done. There may 
> be better mechanisms, and I'd love to hear if anybody has any ideas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

Reply via email to