[ 
https://issues.apache.org/jira/browse/KAFKA-6745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16805561#comment-16805561
 ] 

Guozhang Wang commented on KAFKA-6745:
--------------------------------------

I think the root cause is that when you are bouncing a consumer instance, the 
consumer's member.id is not kicked out of the group yet when it was re-started 
and hence re-join as a new member. In this case the old.member will never send 
a re-join group and the coordinator will always have to wait till the 
rebalance.timeout (5 min) has elapsed to kick out the member.

Could you describe how did you rebalance the consumer? Did you gracefully 
shutdown each instance, and then restarted them?

> kafka consumer rebalancing takes long time (from 3 secs to 5 minutes)
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-6745
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6745
>             Project: Kafka
>          Issue Type: Improvement
>          Components: clients, core
>    Affects Versions: 0.11.0.0
>            Reporter: Ramkumar
>            Priority: Major
>
> Hi, We had an HTTP service 3 nodes around Kafka 0.8 . This http service acts 
> as a REST api for the publishers and consumers to use middleware intead of 
> using kafka client api. Here the when the consumers rebalance is not a major 
> issue.
> We wanted to upgrade to kafka 0.11 , we have updated our http services (3 
> node cluster) to use new Kafka consumer API , but it takes rebalancing of 
> consumer (multiple consumer under same Group) between secs to 5 mins 
> (max.poll.interval.ms). Because of this time our http clients are timing out 
> and do failover. This rebalancing time is major issue. It is not clear from 
> the documentation ,that rebalance activity for the group takes place after 
> max.poll.interval.ms  or it starts after 3 secs and complete any time with in 
> 5 minutes. We tried to reduce max.poll.interval.ms   to 15 seconds. but this 
> also triggers rebalance internally.
> Below are the other parameters we have set In our service
> max.poll.interval.ms = 30 sec
>  seconds heartbeat.interval.ms = 1
> minute session.timeout.ms = 4
> minutes consumer.cache.timeout = 2 min
>  
>  
> below is the log
> ""2018-03-26 12:53:23,009 [qtp1404928347-11556] INFO  
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator - 
> (Re-)joining group firstnetportal_001
> ""2018-03-26 12:57:52,793 [qtp1404928347-11556] INFO  
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator - 
> Successfully joined group firstnetportal_001 with generation 7475
> Please let me know if there are any other application/client use http 
> interace in 3 nodes with out any having this  issue
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to