[ 
https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103028#comment-17103028
 ] 

GEORGE LI commented on KAFKA-4084:
----------------------------------

[~blodsbror]

hmm...That's weird. 

auto.leader.rebalance.enable seems to be functioning as it is meant.  need to 
make sure the controller has it set correctly.  

I wonder what the running config is.  could you this:

{code}
~/confluent-kafka-go$ git remote -v
origin  https://github.com/confluentinc/confluent-kafka-go.git (fetch)
origin  https://github.com/confluentinc/confluent-kafka-go.git (push)

~/confluent-kafka-go$ go run 
examples/admin_describe_config/admin_describe_config.go <broker_host>:9092 
broker  <broker_id> |grep auto
                                auto.leader.rebalance.enable = false            
                                              STATIC_BROKER_CONFIG 
Read-only:true Sensitive:false
                                   auto.create.topics.enable = true             
                                              STATIC_BROKER_CONFIG 
Read-only:true Sensitive:false
{code}

The above  auto.leader.rebalance.enable = false     is the real/actual config.  
some configs can be change dynamically while the process is running.  just want 
to make sure.  do it for all brokers.   

Another cause might be some cluster management software running (like cruise 
control),  that might be doing PLE periodically?  that will make te current 
leader = first replica when first replica is in ISR.



> automated leader rebalance causes replication downtime for clusters with too 
> many partitions
> --------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-4084
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4084
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>            Reporter: Tom Crayford
>            Priority: Major
>              Labels: reliability
>             Fix For: 1.1.0
>
>
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and 
> you have a cluster with many partitions, there is a severe amount of 
> replication downtime following a restart. This causes 
> `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes 
> leaders for *all* imbalanced partitions at once, instead of doing it 
> gradually. This effectively stops all replica fetchers in the cluster 
> (assuming there are enough imbalanced partitions), and restarts them. This 
> can take minutes on busy clusters, during which no replication is happening 
> and user data is at risk. Clients with {{acks=-1}} also see issues at this 
> time, because replication is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election 
> manually. There is also a broker configuration “auto.leader.rebalance.enable” 
> which you can set to have the broker automatically perform the PLE when 
> needed. DO NOT USE THIS OPTION. There are serious performance issues when 
> doing so, especially on larger clusters. It needs some development work that 
> has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high 
> partition counts causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of 
> partitions to do automated leader rebalancing for at once, and *stop* once 
> that number of leader rebalances are in flight, until they're done. There may 
> be better mechanisms, and I'd love to hear if anybody has any ideas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to