Joseph Aliase created KAFKA-7457:
------------------------------------

             Summary: AbstractCoordinator struck in discover
                 Key: KAFKA-7457
                 URL: https://issues.apache.org/jira/browse/KAFKA-7457
             Project: Kafka
          Issue Type: Bug
          Components: clients
    Affects Versions: 0.10.1.1
         Environment: Linux
            Reporter: Joseph Aliase


AbstractCoordinator in kafka-client is stuck in discover and never rejoins the 
group. Post restart application is able to join the consumer group and consume 
from the topic.

We see below logs every 10 minute. The sequence of events are:

a) NetworkClient complains that connection is idle and closes the connection.

b) Consumer client tries to determine co-ordinator by sending request to Node 2.

c) Node 2 responds by saying Node 3 is group co-ordinator.

d) Consumer client connects to group co-ordinator.

e) There is radio silence for 10 minutes and the sequence gets repeated. 

 

2018-09-28 16:35:59.771 TRACE org.apache.kafka.common.network.Selector 
[pool-4-thread-50] [active] [wc] About to close the idle connection from 
2147483644 due to being idle for 540140 millis
2018-09-28 16:35:59.771 DEBUG org.apache.kafka.clients.NetworkClient 
[pool-4-thread-50] [active] [wc] Node 2147483644 disconnected.
2018-09-28 16:35:59.771 INFO 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator 
[pool-4-thread-50] [active] [wc] Marking the coordinator kafka03-wc.net:9092 
(id: 2147483644 rack: null) dead for group test
2018-09-28 16:35:59.771 DEBUG 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator 
[pool-4-thread-50] [active] [wc] Sending coordinator request for group test to 
broker kafka02-wc.net:9092 (id: 2 rack: null)
2018-09-28 16:35:59.771 DEBUG org.apache.kafka.clients.NetworkClient 
[pool-4-thread-50] [active] [wc] Sending metadata request \{topics=[address]} 
to node 2
2018-09-28 16:35:59.796 DEBUG org.apache.kafka.clients.Metadata 
[pool-4-thread-50] [active] [wc] Updated cluster metadata version 401 to 
Cluster(id = oC0BPXfOT42WqN7-v6b5Gw, nodes = [kafka02-wc.net:9092 (id: 2 rack: 
null), kafka03-wc.net:9092 (id: 3 rack: null), kafka05-wc.net:9092 (id: 5 rack: 
null), kafka01-wc.net:9092 (id: 1 rack: null), kafka04-wc.net:9092 (id: 4 rack: 
null)], partitions = [Partition(topic = address, partition = 2, leader = 1, 
replicas = [1,5,4,], isr = [5,1,4,]), Partition(topic = address, partition = 1, 
leader = 5, replicas = [5,3,4,], isr = [5,3,4,]), Partition(topic = address, 
partition = 0, leader = 4, replicas = [2,3,4,], isr = [4,3,2,]), 
Partition(topic = address, partition = 6, leader = 5, replicas = [1,5,4,], isr 
= [5,1,4,]), Partition(topic = address, partition = 5, leader = 4, replicas = 
[5,3,4,], isr = [5,4,3,]), Partition(topic = address, partition = 4, leader = 
3, replicas = [1,2,3,], isr = [1,3,2,]), Partition(topic = address, partition = 
3, leader = 2, replicas = [1,5,2,], isr = [5,1,2,]), Partition(topic = address, 
partition = 16, leader = 5, replicas = [5,2,3,], isr = [5,3,2,]), 
Partition(topic = address, partition = 15, leader = 4, replicas = [1,2,4,], isr 
= [1,4,2,]), Partition(topic = address, partition = 10, leader = 4, replicas = 
[1,5,4,], isr = [5,1,4,]), Partition(topic = address, partition = 9, leader = 
3, replicas = [2,3,4,], isr = [3,4,2,]), Partition(topic = address, partition = 
8, leader = 2, replicas = [1,2,3,], isr = [1,3,2,]), Partition(topic = address, 
partition = 7, leader = 1, replicas = [1,5,2,], isr = [5,1,2,]), 
Partition(topic = address, partition = 14, leader = 3, replicas = [5,3,4,], isr 
= [5,4,3,]), Partition(topic = address, partition = 13, leader = 2, replicas = 
[2,3,4,], isr = [3,4,2,]), Partition(topic = address, partition = 12, leader = 
1, replicas = [1,2,3,], isr = [1,3,2,]), Partition(topic = address, partition = 
11, leader = 5, replicas = [1,5,2,], isr = [5,1,2,])])
2018-09-28 16:35:59.797 DEBUG 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator 
[pool-4-thread-50] [active] [wc] Received group coordinator response 
ClientResponse(receivedTimeMs=1538152559797, disconnected=false, 
request=ClientRequest(expectResponse=true, 
callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@2531ff5f,
 
request=RequestSend(header=\{api_key=10,api_version=0,correlation_id=8354,client_id=active-wc-1-256},
 body=\{group_id=test}), createdTimeMs=1538152559771, 
sendTimeMs=1538152559771), 
responseBody=\{error_code=0,coordinator={node_id=3,host=kafka03-wc.net,port=9092}})
2018-09-28 16:35:59.797 INFO 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator 
[pool-4-thread-50] [active] [wc] Discovered coordinator kafka03-wc.net:9092 
(id: 2147483644 rack: null) for group test.
2018-09-28 16:35:59.797 INFO 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator 
[pool-4-thread-50] [active] [wc] Marking the coordinator kafka03-wc.net:9092 
(id: 2147483644 rack: null) dead for group test
2018-09-28 16:36:09.797 DEBUG 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator 
[pool-4-thread-50] [active] [wc] Sending coordinator request for group test to 
broker kafka02-wc.net:9092 (id: 2 rack: null)
2018-09-28 16:36:09.822 DEBUG 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator 
[pool-4-thread-50] [active] [wc] Received group coordinator response 
ClientResponse(receivedTimeMs=1538152569822, disconnected=false, 
request=ClientRequest(expectResponse=true, 
callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@773b5447,
 
request=RequestSend(header=\{api_key=10,api_version=0,correlation_id=8356,client_id=active-wc-1-256},
 body=\{group_id=test}), createdTimeMs=1538152569797, 
sendTimeMs=1538152569797), 
responseBody=\{error_code=0,coordinator={node_id=3,host=kafka03-wc.net,port=9092}})
2018-09-28 16:36:09.822 INFO 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator 
[pool-4-thread-50] [active] [wc] Discovered coordinator kafka03-wc.net:9092 
(id: 2147483644 rack: null) for group test.
2018-09-28 16:36:09.822 DEBUG org.apache.kafka.clients.NetworkClient 
[pool-4-thread-50] [active] [wc] Initiating connection to node 2147483644 at 
kafka03-wc.net:9092.
2018-09-28 16:36:09.846 DEBUG org.apache.kafka.common.network.Selector 
[pool-4-thread-50] [active] [wc] Created socket with SO_RCVBUF = 65536, 
SO_SNDBUF = 131072, SO_TIMEOUT = 0 to node 2147483644
2018-09-28 16:36:09.846 DEBUG org.apache.kafka.clients.NetworkClient 
[pool-4-thread-50] [active] [wc] Completed connection to node 2147483644



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to