Hello, we ran into a memory issue on a Kafka 0.10.0.1 broker we are running that required a system restart. Since bringing Kafka back up it seems the consumers are having issues finding their coordinators. Here are some errors we’ve seen in our server logs after restarting:
[2017-01-12 19:02:10,178] ERROR [Group Metadata Manager on Broker 0]: Error in loading offsets from [__consumer_offsets,40] (kafka.coordinator.GroupMetadataManager) java.nio.channels.ClosedChannelException at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99) at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:678) at kafka.log.FileMessageSet.searchFor(FileMessageSet.scala:135) at kafka.log.LogSegment.translateOffset(LogSegment.scala:106) at kafka.log.LogSegment.read(LogSegment.scala:127) at kafka.log.Log.read(Log.scala:532) at kafka.coordinator.GroupMetadataManager$$anonfun$kafka$coordinator$GroupMetadataManager$$loadGroupsAndOffsets$1$1.apply$mcV$sp(GroupMetadataManager.scala:380) at kafka.coordinator.GroupMetadataManager$$anonfun$kafka$coordinator$GroupMetadataManager$$loadGroupsAndOffsets$1$1.apply(GroupMetadataManager.scala:374) at kafka.coordinator.GroupMetadataManager$$anonfun$kafka$coordinator$GroupMetadataManager$$loadGroupsAndOffsets$1$1.apply(GroupMetadataManager.scala:374) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231) at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:239) at kafka.coordinator.GroupMetadataManager.kafka$coordinator$GroupMetadataManager$$loadGroupsAndOffsets$1(GroupMetadataManager.scala:374) at kafka.coordinator.GroupMetadataManager$$anonfun$loadGroupsForPartition$1.apply$mcV$sp(GroupMetadataManager.scala:353) at kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110) at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:56) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) [2017-01-12 19:03:56,468] ERROR [KafkaApi-0] Error when handling request {topics=[__consumer_offsets]} (kafka.server.KafkaApis) kafka.admin.AdminOperationException: replication factor: 1 larger than available brokers: 0 at kafka.admin.AdminUtils$.assignReplicasToBrokers(AdminUtils.scala:117) at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:403) at kafka.server.KafkaApis.kafka$server$KafkaApis$$createTopic(KafkaApis.scala:629) at kafka.server.KafkaApis.kafka$server$KafkaApis$$createGroupMetadataTopic(KafkaApis.scala:651) at kafka.server.KafkaApis$$anonfun$29.apply(KafkaApis.scala:668) at kafka.server.KafkaApis$$anonfun$29.apply(KafkaApis.scala:666) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.Set$Set1.foreach(Set.scala:94) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47) at scala.collection.SetLike$class.map(SetLike.scala:92) at scala.collection.AbstractSet.map(Set.scala:47) at kafka.server.KafkaApis.getTopicMetadata(KafkaApis.scala:666) at kafka.server.KafkaApis.handleTopicMetadataRequest(KafkaApis.scala:727) at kafka.server.KafkaApis.handle(KafkaApis.scala:79) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) at java.lang.Thread.run(Thread.java:744) Also running the kafka-consumer-groups.sh on a consumer group returns the following: Error while executing consumer group command This is not the correct coordinator for this group. org.apache.kafka.common.errors.NotCoordinatorForGroupException: This is not the correct coordinator for this group. We also see the following logs when trying to restart a Kafka connector: [2017-01-12 17:44:07,941] INFO Discovered coordinator lxskfkdal501.nanigans.com:9092 (id: 2147483647 rack: null) for group connect-paid_events_s3. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:505) [2017-01-12 17:44:07,941] INFO (Re-)joining group connect-paid_events_s3 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:326) [2017-01-12 17:44:07,941] INFO Marking the coordinator lxskfkdal501.nanigans.com:9092 (id: 2147483647 rack: null) dead for group connect-paid_events_s3 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:542) Does anyone have recommendations for what we can do to recover from this issue? Thanks, Dave