Sampath Kumar created KAFKA-7017:
------------------------------------
Summary: GroupCoordinator response error: Broker: Group
coordinator not available
Key: KAFKA-7017
URL: https://issues.apache.org/jira/browse/KAFKA-7017
Project: Kafka
Issue Type: Bug
Components: consumer, controller, core, offset manager
Affects Versions: 1.1.0
Environment: Our Setup details as follows
Confluent Kafka Image : confluentinc/cp-enterprise-kafka:4.1.0
In testing setup, we are using Single Broker setup, Deployed in a K8S cluster
We newly deployed our application including broker in K8S cluster, observed the
following issue for the first time, resulting in our applications failed to
come up
Reporter: Sampath Kumar
Fix For: 1.1.0
__
1. Most of the consumers got stuck while reading the data from Kafka topic, the
stuck stack trace is given as below, After certain timeout application got
restarted, try to connect with the same consumer group, however, it still went
to same stuck stack
"main" #1 prio=5 os_prio=0 tid=0x0000000001811800 nid=0x194 runnable
[0x00007ffe513bd000]
java.lang.Thread.State: RUNNABLE
at
org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:104)
at
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:122)
at
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:93)
at
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:235)
at
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:196)
at
org.apache.kafka.common.network.Selector.attemptRead(Selector.java:557)
at
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:495)
at org.apache.kafka.common.network.Selector.poll(Selector.java:424)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460)
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261)
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:156)
at
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:228)
- locked <0x00000000ae7acf08> (a
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
at
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:205)
- locked <0x00000000ae7acf08> (a
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
at
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.fetchCommittedOffsets(ConsumerCoordinator.java:465)
at
org.apache.kafka.clients.consumer.KafkaConsumer.committed(KafkaConsumer.java:1461)
2. To debug further installed KafkaCat, tried to consume the data using same
consumer group which is getting stuck, and then with the new consumer group.
Stuck consumer group we are not able to consume data, however new consumer
group it was able to consume the data, the error is seen for stuck consumer
group as follows
7|1528304675.172|COMMIT|rdkafka#consumer-1| OffsetCommit for -1 partition(s)
returned: Local: No offset stored
%7|1528304675.172|UNASSIGN|rdkafka#consumer-1| Group "agent.defaultagent":
unassign done in state wait-broker (join state init): without new assignment:
OffsetCommit done (__NO_OFFSET)
%7|1528304675.223|CGRPQUERY|rdkafka#consumer-1| broker:9092/bootstrap: Group
"agent.defaultagent": querying for coordinator: intervaled in state wait-broker
%7|1528304675.244|SEND|rdkafka#consumer-1| broker:9092/bootstrap: Sent
GroupCoordinatorRequest (v0, 41 bytes @ 0, CorrId 25)
%7|1528304675.255|RECV|rdkafka#consumer-1| broker:9092/bootstrap: Received
GroupCoordinatorResponse (v0, 12 bytes, CorrId 25, rtt 10.91ms)
%7|1528304675.326|CGRPCOORD|rdkafka#consumer-1| broker:9092/bootstrap: Group
"agent.defaultagent" GroupCoordinator response error: Broker: Group coordinator
not available
%7|1528304676.226|CGRPQUERY|rdkafka#consumer-1|
broker-0.broker.default.svc.cluster.local:9092/0: Group "agent.defaultagent":
querying for coordinator: intervaled in state wait-broker
%7|1528304676.330|SEND|rdkafka#consumer-1|
broker-0.broker.default.svc.cluster.local:9092/0: Sent GroupCoordinatorRequest
(v0, 41 bytes @ 0, CorrId 33)
%7|1528304676.350|RECV|rdkafka#consumer-1|
broker-0.broker.default.svc.cluster.local:9092/0: Received
GroupCoordinatorResponse (v0, 12 bytes, CorrId 33, rtt 19.93ms)
*%7|1528304676.430|CGRPCOORD|rdkafka#consumer-1|
broker-0.broker.default.svc.cluster.local:9092/0: Group "agent.defaultagent"
GroupCoordinator response error: Broker: Group coordinator not available*
%7|1528304677.226|CGRPQUERY|rdkafka#consumer-1| broker:9092/bootstrap: Group
"agent.defaultagent": querying for coordinator: intervaled in state wait-broker
3. Tried to delete the stuck consumer group, however, its failing with the same
highlighted error
Error: Deletion of some consumer groups failed:
* Group 'agent.defaultagent' could not be deleted due to:
COORDINATOR_NOT_AVAILABLE
4. From the link I can see
[http://home.apache.org/~ewencp/kafka-0.10.2.0-rc1/javadoc/org/apache/kafka/common/errors/GroupCoordinatorNotAvailableException.html]
this is a temporary issue, will get resolved once offset topic created, but in
our case, it's not recovered, however for the same topic with different
consumer group consumption is happenings
Can you let me know the way to recover the system, without restarting the
broker or Zookeeper, What is the way to avoid this race condition, also is this
is a bug in Kafka?
Let me know if any other details required
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)