[jira] [Comment Edited] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

GEORGE LI (Jira) Sun, 19 Apr 2020 00:29:21 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17086772#comment-17086772
 ]


GEORGE LI edited comment on KAFKA-4084 at 4/19/20, 7:28 AM:
------------------------------------------------------------

[~blodsbror]

I am not very familiar with 5.4 setup. 

Do you have the error message of the crash in the log?  is it missing the 
zkclient jar like below? 

{code}
$ ls -l zk*.jar
-rw-r--r-- 1 georgeli engineering 74589 Nov 18 18:21 zkclient-0.11.jar
$ jar tvf zkclient-0.11.jar 
     0 Mon Nov 18 18:11:58 UTC 2019 META-INF/
  1135 Mon Nov 18 18:11:58 UTC 2019 META-INF/MANIFEST.MF
     0 Mon Nov 18 18:11:58 UTC 2019 org/
     0 Mon Nov 18 18:11:58 UTC 2019 org/I0Itec/
     0 Mon Nov 18 18:11:58 UTC 2019 org/I0Itec/zkclient/
  3486 Mon Nov 18 18:11:58 UTC 2019 org/I0Itec/zkclient/ContentWatcher.class
   263 Mon Nov 18 18:11:58 UTC 2019 org/I0Itec/zkclient/DataUpdater.class
{code}

If this jar file was there before, please copy it back.   I need to find out 
why it was missing after the build.  maybe some dependency setup in gradle.  I 
have also update the [install doc 
|https://docs.google.com/document/d/14vlPkbaog_5Xdd-HB4vMRaQQ7Fq4SlxddULsvc3PlbY/edit]
 using `./gradew clean build -x test` 

Also make sure the startup script for kafka is not hard coding 5.4 jars,  but 
take the jars from the lib classpath?  e.g.

{code}
/usr/lib/jvm/java-8-openjdk-amd64/bin/java 
-Dlog4j.configuration=file:/etc/kafka/log4j.xml -Xms22G -Xmx22G -XX:+UseG1GC 
-XX:MaxGCPauseMillis=20 -XX:NewSize=16G -XX:MaxNewSize=16G 
-XX:InitiatingHeapOccupancyPercent=3 -XX:G1MixedGCCountTarget=1 
-XX:G1HeapWastePercent=1 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
-XX:+PrintGCDateStamps -verbose:gc -Xloggc:/var/log/kafka/gc-kafka.log -server 
-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.port=29010 
-Djava.rmi.server.hostname=kafka12345-dca4 -cp '.:/usr/share/kafka/lib/*' 
kafka.Kafka /etc/kafka/server.properties
{code}


If you give us more details, we can help more. 

Thanks


Actually,  I just patched and added back zkclient libs for the gradle build.  
Please "git clone https://github.com/sql888/kafka.git"; (or git pull)   and try 
to build again.  I suspect that was the issue.   Otherwise, we need to see the 
errors of the crash from the kafka logs. 




was (Author: sql_consulting):
[~blodsbror]

I am not very familiar with 5.4 setup. 

Do you have the error message of the crash in the log?  is it missing the 
zkclient jar like below? 

{code}
$ ls -l zk*.jar
-rw-r--r-- 1 georgeli engineering 74589 Nov 18 18:21 zkclient-0.11.jar
$ jar tvf zkclient-0.11.jar 
     0 Mon Nov 18 18:11:58 UTC 2019 META-INF/
  1135 Mon Nov 18 18:11:58 UTC 2019 META-INF/MANIFEST.MF
     0 Mon Nov 18 18:11:58 UTC 2019 org/
     0 Mon Nov 18 18:11:58 UTC 2019 org/I0Itec/
     0 Mon Nov 18 18:11:58 UTC 2019 org/I0Itec/zkclient/
  3486 Mon Nov 18 18:11:58 UTC 2019 org/I0Itec/zkclient/ContentWatcher.class
   263 Mon Nov 18 18:11:58 UTC 2019 org/I0Itec/zkclient/DataUpdater.class
{code}

If this jar file was there before, please copy it back.   I need to find out 
why it was missing after the build.  maybe some dependency setup in gradle.  I 
have also update the [install doc 
|https://docs.google.com/document/d/14vlPkbaog_5Xdd-HB4vMRaQQ7Fq4SlxddULsvc3PlbY/edit]
 using `./gradew clean build -x test` 

Also make sure the startup script for kafka is not hard coding 5.4 jars,  but 
take the jars from the lib classpath?  e.g.

{code}
/usr/lib/jvm/java-8-openjdk-amd64/bin/java 
-Dlog4j.configuration=file:/etc/kafka/log4j.xml -Xms22G -Xmx22G -XX:+UseG1GC 
-XX:MaxGCPauseMillis=20 -XX:NewSize=16G -XX:MaxNewSize=16G 
-XX:InitiatingHeapOccupancyPercent=3 -XX:G1MixedGCCountTarget=1 
-XX:G1HeapWastePercent=1 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
-XX:+PrintGCDateStamps -verbose:gc -Xloggc:/var/log/kafka/gc-kafka.log -server 
-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.port=29010 
-Djava.rmi.server.hostname=kafka12345-dca4 -cp '.:/usr/share/kafka/lib/*' 
kafka.Kafka /etc/kafka/server.properties
{code}


If you give us more details, we can help more. 

Thanks


Actually,  I just patched and added back zkclient libs for the gradle build.  
Please "git clone https://github.com/sql888/kafka.git"; and try to build again.  
I suspect that was the issue.   Otherwise, we need to see the errors of the 
crash from the kafka logs. 



> automated leader rebalance causes replication downtime for clusters with too 
> many partitions
> --------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-4084
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4084
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>            Reporter: Tom Crayford
>            Priority: Major
>              Labels: reliability
>             Fix For: 1.1.0
>
>
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and 
> you have a cluster with many partitions, there is a severe amount of 
> replication downtime following a restart. This causes 
> `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes 
> leaders for *all* imbalanced partitions at once, instead of doing it 
> gradually. This effectively stops all replica fetchers in the cluster 
> (assuming there are enough imbalanced partitions), and restarts them. This 
> can take minutes on busy clusters, during which no replication is happening 
> and user data is at risk. Clients with {{acks=-1}} also see issues at this 
> time, because replication is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election 
> manually. There is also a broker configuration “auto.leader.rebalance.enable” 
> which you can set to have the broker automatically perform the PLE when 
> needed. DO NOT USE THIS OPTION. There are serious performance issues when 
> doing so, especially on larger clusters. It needs some development work that 
> has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high 
> partition counts causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of 
> partitions to do automated leader rebalancing for at once, and *stop* once 
> that number of leader rebalances are in flight, until they're done. There may 
> be better mechanisms, and I'd love to hear if anybody has any ideas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

Reply via email to