[
https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076167#comment-17076167
]
Evan Williams commented on KAFKA-4084:
--------------------------------------
[~sql_consulting]
Many thanks for that details info. Sounds like some great work there, and very
handy features.
On another somewhat related note (and I can open a bug ticket for this if need
be) - I've noticed than on a cluster (5.4), with 6000 topics - manual leader
election times out. Is there a way to increase the timeout ? If not, then our
only option is auto.leader.rebalance.enable=true. I guess it's important PLE
works, for all of this functionality to work properly.
*kafka-leader-election --bootstrap-server $(grep advertised.listeners=
/etc/kafka/server.properties |cut -d: -f4 |cut -d/ -f3):9092
--all-topic-partitions --election-type preferred*
Timeout waiting for election results
Exception in thread "main" kafka.common.AdminCommandFailedException: Timeout
waiting for election results
at
kafka.admin.LeaderElectionCommand$.electLeaders(LeaderElectionCommand.scala:133)
at kafka.admin.LeaderElectionCommand$.run(LeaderElectionCommand.scala:88)
at kafka.admin.LeaderElectionCommand$.main(LeaderElectionCommand.scala:41)
at kafka.admin.LeaderElectionCommand.main(LeaderElectionCommand.scala)
Caused by: org.apache.kafka.common.errors.TimeoutException: Aborted due to
timeout.
> automated leader rebalance causes replication downtime for clusters with too
> many partitions
> --------------------------------------------------------------------------------------------
>
> Key: KAFKA-4084
> URL: https://issues.apache.org/jira/browse/KAFKA-4084
> Project: Kafka
> Issue Type: Bug
> Components: controller
> Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
> Reporter: Tom Crayford
> Priority: Major
> Labels: reliability
> Fix For: 1.1.0
>
>
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and
> you have a cluster with many partitions, there is a severe amount of
> replication downtime following a restart. This causes
> `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes
> leaders for *all* imbalanced partitions at once, instead of doing it
> gradually. This effectively stops all replica fetchers in the cluster
> (assuming there are enough imbalanced partitions), and restarts them. This
> can take minutes on busy clusters, during which no replication is happening
> and user data is at risk. Clients with {{acks=-1}} also see issues at this
> time, because replication is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election
> manually. There is also a broker configuration “auto.leader.rebalance.enable”
> which you can set to have the broker automatically perform the PLE when
> needed. DO NOT USE THIS OPTION. There are serious performance issues when
> doing so, especially on larger clusters. It needs some development work that
> has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high
> partition counts causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of
> partitions to do automated leader rebalancing for at once, and *stop* once
> that number of leader rebalances are in flight, until they're done. There may
> be better mechanisms, and I'd love to hear if anybody has any ideas.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)