[ https://issues.apache.org/jira/browse/KAFKA-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph Aliase updated KAFKA-7457: --------------------------------- Summary: AbstractCoordinator struck in Discover (was: AbstractCoordinator struck in discover) > AbstractCoordinator struck in Discover > -------------------------------------- > > Key: KAFKA-7457 > URL: https://issues.apache.org/jira/browse/KAFKA-7457 > Project: Kafka > Issue Type: Bug > Components: clients > Affects Versions: 0.10.1.1 > Environment: Linux > Reporter: Joseph Aliase > Priority: Minor > > AbstractCoordinator in kafka-client is stuck in discover and never rejoins > the group. Post restart application is able to join the consumer group and > consume from the topic. > We see below logs every 10 minute. The sequence of events are: > a) NetworkClient complains that connection is idle and closes the connection. > b) Consumer client tries to determine co-ordinator by sending request to Node > 2. > c) Node 2 responds by saying Node 3 is group co-ordinator. > d) Consumer client connects to group co-ordinator. > e) There is radio silence for 10 minutes and the sequence gets repeated. > > 2018-09-28 16:35:59.771 TRACE org.apache.kafka.common.network.Selector > [pool-4-thread-50] [active] [wc] About to close the idle connection from > 2147483644 due to being idle for 540140 millis > 2018-09-28 16:35:59.771 DEBUG org.apache.kafka.clients.NetworkClient > [pool-4-thread-50] [active] [wc] Node 2147483644 disconnected. > 2018-09-28 16:35:59.771 INFO > org.apache.kafka.clients.consumer.internals.AbstractCoordinator > [pool-4-thread-50] [active] [wc] Marking the coordinator kafka03-wc.net:9092 > (id: 2147483644 rack: null) dead for group test > 2018-09-28 16:35:59.771 DEBUG > org.apache.kafka.clients.consumer.internals.AbstractCoordinator > [pool-4-thread-50] [active] [wc] Sending coordinator request for group test > to broker kafka02-wc.net:9092 (id: 2 rack: null) > 2018-09-28 16:35:59.771 DEBUG org.apache.kafka.clients.NetworkClient > [pool-4-thread-50] [active] [wc] Sending metadata request \{topics=[address]} > to node 2 > 2018-09-28 16:35:59.796 DEBUG org.apache.kafka.clients.Metadata > [pool-4-thread-50] [active] [wc] Updated cluster metadata version 401 to > Cluster(id = oC0BPXfOT42WqN7-v6b5Gw, nodes = [kafka02-wc.net:9092 (id: 2 > rack: null), kafka03-wc.net:9092 (id: 3 rack: null), kafka05-wc.net:9092 (id: > 5 rack: null), kafka01-wc.net:9092 (id: 1 rack: null), kafka04-wc.net:9092 > (id: 4 rack: null)], partitions = [Partition(topic = address, partition = 2, > leader = 1, replicas = [1,5,4,], isr = [5,1,4,]), Partition(topic = address, > partition = 1, leader = 5, replicas = [5,3,4,], isr = [5,3,4,]), > Partition(topic = address, partition = 0, leader = 4, replicas = [2,3,4,], > isr = [4,3,2,]), Partition(topic = address, partition = 6, leader = 5, > replicas = [1,5,4,], isr = [5,1,4,]), Partition(topic = address, partition = > 5, leader = 4, replicas = [5,3,4,], isr = [5,4,3,]), Partition(topic = > address, partition = 4, leader = 3, replicas = [1,2,3,], isr = [1,3,2,]), > Partition(topic = address, partition = 3, leader = 2, replicas = [1,5,2,], > isr = [5,1,2,]), Partition(topic = address, partition = 16, leader = 5, > replicas = [5,2,3,], isr = [5,3,2,]), Partition(topic = address, partition = > 15, leader = 4, replicas = [1,2,4,], isr = [1,4,2,]), Partition(topic = > address, partition = 10, leader = 4, replicas = [1,5,4,], isr = [5,1,4,]), > Partition(topic = address, partition = 9, leader = 3, replicas = [2,3,4,], > isr = [3,4,2,]), Partition(topic = address, partition = 8, leader = 2, > replicas = [1,2,3,], isr = [1,3,2,]), Partition(topic = address, partition = > 7, leader = 1, replicas = [1,5,2,], isr = [5,1,2,]), Partition(topic = > address, partition = 14, leader = 3, replicas = [5,3,4,], isr = [5,4,3,]), > Partition(topic = address, partition = 13, leader = 2, replicas = [2,3,4,], > isr = [3,4,2,]), Partition(topic = address, partition = 12, leader = 1, > replicas = [1,2,3,], isr = [1,3,2,]), Partition(topic = address, partition = > 11, leader = 5, replicas = [1,5,2,], isr = [5,1,2,])]) > 2018-09-28 16:35:59.797 DEBUG > org.apache.kafka.clients.consumer.internals.AbstractCoordinator > [pool-4-thread-50] [active] [wc] Received group coordinator response > ClientResponse(receivedTimeMs=1538152559797, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@2531ff5f, > > request=RequestSend(header=\{api_key=10,api_version=0,correlation_id=8354,client_id=active-wc-1-256}, > body=\{group_id=test}), createdTimeMs=1538152559771, > sendTimeMs=1538152559771), > responseBody=\{error_code=0,coordinator={node_id=3,host=kafka03-wc.net,port=9092}}) > 2018-09-28 16:35:59.797 INFO > org.apache.kafka.clients.consumer.internals.AbstractCoordinator > [pool-4-thread-50] [active] [wc] Discovered coordinator kafka03-wc.net:9092 > (id: 2147483644 rack: null) for group test. > 2018-09-28 16:35:59.797 INFO > org.apache.kafka.clients.consumer.internals.AbstractCoordinator > [pool-4-thread-50] [active] [wc] Marking the coordinator kafka03-wc.net:9092 > (id: 2147483644 rack: null) dead for group test > 2018-09-28 16:36:09.797 DEBUG > org.apache.kafka.clients.consumer.internals.AbstractCoordinator > [pool-4-thread-50] [active] [wc] Sending coordinator request for group test > to broker kafka02-wc.net:9092 (id: 2 rack: null) > 2018-09-28 16:36:09.822 DEBUG > org.apache.kafka.clients.consumer.internals.AbstractCoordinator > [pool-4-thread-50] [active] [wc] Received group coordinator response > ClientResponse(receivedTimeMs=1538152569822, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@773b5447, > > request=RequestSend(header=\{api_key=10,api_version=0,correlation_id=8356,client_id=active-wc-1-256}, > body=\{group_id=test}), createdTimeMs=1538152569797, > sendTimeMs=1538152569797), > responseBody=\{error_code=0,coordinator={node_id=3,host=kafka03-wc.net,port=9092}}) > 2018-09-28 16:36:09.822 INFO > org.apache.kafka.clients.consumer.internals.AbstractCoordinator > [pool-4-thread-50] [active] [wc] Discovered coordinator kafka03-wc.net:9092 > (id: 2147483644 rack: null) for group test. > 2018-09-28 16:36:09.822 DEBUG org.apache.kafka.clients.NetworkClient > [pool-4-thread-50] [active] [wc] Initiating connection to node 2147483644 at > kafka03-wc.net:9092. > 2018-09-28 16:36:09.846 DEBUG org.apache.kafka.common.network.Selector > [pool-4-thread-50] [active] [wc] Created socket with SO_RCVBUF = 65536, > SO_SNDBUF = 131072, SO_TIMEOUT = 0 to node 2147483644 > 2018-09-28 16:36:09.846 DEBUG org.apache.kafka.clients.NetworkClient > [pool-4-thread-50] [active] [wc] Completed connection to node 2147483644 -- This message was sent by Atlassian JIRA (v7.6.3#76005)