[jira] [Created] (KAFKA-10134) High CPU issue during rebalance in Kafka consumer after upgrading to 2.5

Sean Guo (Jira) Tue, 09 Jun 2020 17:08:48 -0700

Sean Guo created KAFKA-10134:
--------------------------------

             Summary: High CPU issue during rebalance in Kafka consumer after 
upgrading to 2.5
                 Key: KAFKA-10134
                 URL: https://issues.apache.org/jira/browse/KAFKA-10134
             Project: Kafka
          Issue Type: Bug
          Components: clients
    Affects Versions: 2.5.0
            Reporter: Sean Guo



We want to utilize the new rebalance protocol to mitigate the stop-the-world 
effect during the rebalance as our tasks are long running task.

But after the upgrade when we try to kill an instance when there is some 
load(long running tasks >30S) there, the CPU will go sky-high. It reads ~700% 
in our metrics so it should several threads are in a tight loop.
{noformat}
"executor.kafka-consumer-executor-4" #124 daemon prio=5 os_prio=0 
cpu=76853.07ms elapsed=841.16s tid=0x00007fe11f044000 nid=0x1f4 runnable  
[0x00007fe119aab000]"executor.kafka-consumer-executor-4" #124 daemon prio=5 
os_prio=0 cpu=76853.07ms elapsed=841.16s tid=0x00007fe11f044000 nid=0x1f4 
runnable  [0x00007fe119aab000]   java.lang.Thread.State: RUNNABLE at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:467)
 at 
org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1275)
 at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1241) 
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216) 
at {noformat}
By debugging into the code we found it looks like the clients are  in a loop on 
finding the coordinator.

I also tried the old rebalance protocol for the new version the issue still 
exists but the CPU will be back to normal when the rebalance is done.

Also tried the same on the 2.4.1 which seems don't have this issue. So it seems 
related something changed between 2.4.1 and 2.5.0.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KAFKA-10134) High CPU issue during rebalance in Kafka consumer after upgrading to 2.5

Reply via email to