[jira] [Commented] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

Evan Williams (Jira) Tue, 04 Feb 2020 22:09:14 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030379#comment-17030379
 ]


Evan Williams commented on KAFKA-4084:
--------------------------------------

Thanks a lot guys. [~sql_consulting] I have actually engineered an automatic 
EBS re-attach solution in AWS, to attach the EBS back to the new instance after 
instance termination. And this indeed does solve the issue. However we have 
decided to move over to I3 instance type (local SSD), where data is removed 
upon termination/stop. This is because you get much more disk performance for 
the price.

[~junrao] 
 # Will that exact command set a throttle on all topics in one shot ? Or must 
you script something up, providing the topic name as well ? I'm assuming 
something needs to be scripted to apply that config (via cron) on a regular 
basics to watch any new topics ?
 # Just to clarify - If leader.replication.throttled.rate and 
follower.replication.throttled.rate only is set *only* on the new broker, will 
its *total* incoming and outgoing replication bandwidth be throttled to those 
exact limits, even when it starts becoming a member of ISR over time, and 
regardless if the other brokers have that config or not ? Can those configs be 
set before the service on the new broker has started ?
 # I totally agree, that a clean/dynamic, cluster wide way to enforce that a 
broker should not become a leader *before* it comes online (as if it can only 
be done after service start, that time may be enough for clients to have 
issues) would be very handy in this situation. Depending on producer/consumer 
config, they may still time out - even with throttling set on the broker. And 
more importantly, depending on the incoming messages per second - the new 
broker may struggle to come into ISR, or at least take a long time to coming 
into ISR. Removing all leadership would help dramatically I would suspect.

> automated leader rebalance causes replication downtime for clusters with too 
> many partitions
> --------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-4084
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4084
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>            Reporter: Tom Crayford
>            Priority: Major
>              Labels: reliability
>             Fix For: 1.1.0
>
>
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and 
> you have a cluster with many partitions, there is a severe amount of 
> replication downtime following a restart. This causes 
> `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes 
> leaders for *all* imbalanced partitions at once, instead of doing it 
> gradually. This effectively stops all replica fetchers in the cluster 
> (assuming there are enough imbalanced partitions), and restarts them. This 
> can take minutes on busy clusters, during which no replication is happening 
> and user data is at risk. Clients with {{acks=-1}} also see issues at this 
> time, because replication is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election 
> manually. There is also a broker configuration “auto.leader.rebalance.enable” 
> which you can set to have the broker automatically perform the PLE when 
> needed. DO NOT USE THIS OPTION. There are serious performance issues when 
> doing so, especially on larger clusters. It needs some development work that 
> has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high 
> partition counts causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of 
> partitions to do automated leader rebalancing for at once, and *stop* once 
> that number of leader rebalances are in flight, until they're done. There may 
> be better mechanisms, and I'd love to hear if anybody has any ideas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

Reply via email to