Hello, we ran into a memory issue on a Kafka 0.10.0.1 broker we are running 
that required a system restart. Since bringing Kafka back up it seems the 
consumers are having issues finding their coordinators. Here are some errors 
we’ve seen in our server logs after restarting:

[2017-01-12 19:02:10,178] ERROR [Group Metadata Manager on Broker 0]: Error in 
loading offsets from [__consumer_offsets,40] 
(kafka.coordinator.GroupMetadataManager)
java.nio.channels.ClosedChannelException
                at 
sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99)
                at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:678)
                at kafka.log.FileMessageSet.searchFor(FileMessageSet.scala:135)
                at kafka.log.LogSegment.translateOffset(LogSegment.scala:106)
                at kafka.log.LogSegment.read(LogSegment.scala:127)
                at kafka.log.Log.read(Log.scala:532)
                at 
kafka.coordinator.GroupMetadataManager$$anonfun$kafka$coordinator$GroupMetadataManager$$loadGroupsAndOffsets$1$1.apply$mcV$sp(GroupMetadataManager.scala:380)
                at 
kafka.coordinator.GroupMetadataManager$$anonfun$kafka$coordinator$GroupMetadataManager$$loadGroupsAndOffsets$1$1.apply(GroupMetadataManager.scala:374)
                at 
kafka.coordinator.GroupMetadataManager$$anonfun$kafka$coordinator$GroupMetadataManager$$loadGroupsAndOffsets$1$1.apply(GroupMetadataManager.scala:374)
                at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
                at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:239)
                at 
kafka.coordinator.GroupMetadataManager.kafka$coordinator$GroupMetadataManager$$loadGroupsAndOffsets$1(GroupMetadataManager.scala:374)
                at 
kafka.coordinator.GroupMetadataManager$$anonfun$loadGroupsForPartition$1.apply$mcV$sp(GroupMetadataManager.scala:353)
                at 
kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
                at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:56)
                at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
                at java.util.concurrent.FutureTask.run(FutureTask.java:262)
                at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
                at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
                at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
                at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                at java.lang.Thread.run(Thread.java:744)
[2017-01-12 19:03:56,468] ERROR [KafkaApi-0] Error when handling request 
{topics=[__consumer_offsets]} (kafka.server.KafkaApis)
kafka.admin.AdminOperationException: replication factor: 1 larger than 
available brokers: 0
                at 
kafka.admin.AdminUtils$.assignReplicasToBrokers(AdminUtils.scala:117)
                at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:403)
                at 
kafka.server.KafkaApis.kafka$server$KafkaApis$$createTopic(KafkaApis.scala:629)
                at 
kafka.server.KafkaApis.kafka$server$KafkaApis$$createGroupMetadataTopic(KafkaApis.scala:651)
                at kafka.server.KafkaApis$$anonfun$29.apply(KafkaApis.scala:668)
                at kafka.server.KafkaApis$$anonfun$29.apply(KafkaApis.scala:666)
                at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
                at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
                at scala.collection.immutable.Set$Set1.foreach(Set.scala:94)
                at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
                at 
scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
                at scala.collection.SetLike$class.map(SetLike.scala:92)
                at scala.collection.AbstractSet.map(Set.scala:47)
                at kafka.server.KafkaApis.getTopicMetadata(KafkaApis.scala:666)
                at 
kafka.server.KafkaApis.handleTopicMetadataRequest(KafkaApis.scala:727)
                at kafka.server.KafkaApis.handle(KafkaApis.scala:79)
                at 
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
                at java.lang.Thread.run(Thread.java:744)

Also running the kafka-consumer-groups.sh on a consumer group returns the 
following:

Error while executing consumer group command This is not the correct 
coordinator for this group.
org.apache.kafka.common.errors.NotCoordinatorForGroupException: This is not the 
correct coordinator for this group.

We also see the following logs when trying to restart a Kafka connector:

[2017-01-12 17:44:07,941] INFO Discovered coordinator 
lxskfkdal501.nanigans.com:9092 (id: 2147483647 rack: null) for group 
connect-paid_events_s3. 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator:505)
[2017-01-12 17:44:07,941] INFO (Re-)joining group connect-paid_events_s3 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator:326)
[2017-01-12 17:44:07,941] INFO Marking the coordinator 
lxskfkdal501.nanigans.com:9092 (id: 2147483647 rack: null) dead for group 
connect-paid_events_s3 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator:542)

Does anyone have recommendations for what we can do to recover from this issue?

Thanks,
Dave

Reply via email to