[ https://issues.apache.org/jira/browse/KAFKA-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553212#comment-16553212 ]
Manikumar commented on KAFKA-6438: ---------------------------------- This might have been fixed in KAFKA-6630 cleanups. > NSEE while concurrently creating and deleting a topic > ----------------------------------------------------- > > Key: KAFKA-6438 > URL: https://issues.apache.org/jira/browse/KAFKA-6438 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 1.0.0 > Environment: kafka_2.11-1.0.0.jar > OpenJDK Runtime Environment (build 1.8.0_102-b14), OpenJDK 64-Bit Server VM > (build 25.102-b14, mixed mode) > CentOS Linux release 7.3.1611 (Core) > Reporter: Adam Kotwasinski > Priority: Major > Labels: reliability > Fix For: 2.1.0 > > > It appears that deleting a topic and creating it at the same time can cause > NSEE, what later results in a forced controller shutdown. > Most probably topics are being created because consumers/producers are still > active (yes, this means the deletion is happening blindly). > The main problem here (for me) is the controller switch, the data loss and > following unclean election is acceptable (as we admit to deleting blindly). > Environment description: > 20 kafka brokers > 80k partitions (20k topics 4partitions each) > 3 node ZK > Incident: > {code:java} > [2018-01-09 11:19:05,912] INFO [Topic Deletion Manager 6], Partition deletion > callback for mytopic-2,mytopic-0,mytopic-1,mytopic-3 > (kafka.controller.TopicDeletionManager) > [2018-01-09 11:19:06,237] INFO [Controller id=6] New leader and ISR for > partition mytopic-0 is {"leader":-1,"leader_epoch":1,"isr":[]} > (kafka.controller.KafkaController) > [2018-01-09 11:19:06,412] INFO [Topic Deletion Manager 6], Deletion for > replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of > topic mytopic in progress (kafka.controller.TopicDeletionManager) > [2018-01-09 11:19:07,218] INFO [Topic Deletion Manager 6], Deletion for > replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of > topic mytopic in progress (kafka.controller.TopicDeletionManager) > [2018-01-09 11:19:07,304] INFO [Topic Deletion Manager 6], Deletion for > replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of > topic mytopic in progress (kafka.controller.TopicDeletionManager) > [2018-01-09 11:19:07,383] INFO [Topic Deletion Manager 6], Deletion for > replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of > topic mytopic in progress (kafka.controller.TopicDeletionManager) > [2018-01-09 11:19:07,510] INFO [Topic Deletion Manager 6], Deletion for > replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of > topic mytopic in progress (kafka.controller.TopicDeletionManager) > [2018-01-09 11:19:07,661] INFO [Topic Deletion Manager 6], Deletion for > replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of > topic mytopic in progress (kafka.controller.TopicDeletionManager) > [2018-01-09 11:19:07,728] INFO [Topic Deletion Manager 6], Deletion for > replicas 9,10,11 for partition mytopic-0,mytopic-1,mytopic-2 of topic mytopic > in progress (kafka.controller.TopicDeletionManager) > [2018-01-09 11:19:07,924] INFO [PartitionStateMachine controllerId=6] > Invoking state change to OfflinePartition for partitions > mytopic-2,mytopic-0,mytopic-1,mytopic-3 > (kafka.controller.PartitionStateMachine) > [2018-01-09 11:19:07,924] INFO [PartitionStateMachine controllerId=6] > Invoking state change to NonExistentPartition for partitions > mytopic-2,mytopic-0,mytopic-1,mytopic-3 > (kafka.controller.PartitionStateMachine) > [2018-01-09 11:19:08,592] INFO [Controller id=6] New topics: [Set(mytopic, > other, other2)], deleted topics: [Set()], new partition replica assignment > [Map(other-0 -> Vector(8), mytopic-2 -> Vector(6), mytopic-0 -> Vector(4), > other-2 -> Vector(10), mytopic-1 -> Vector(5), mytopic-3 -> Vector(7), > other-1 -> Vector(9), other-3 -> Vector(11))] > (kafka.controller.KafkaController) > [2018-01-09 11:19:08,593] INFO [Controller id=6] New topic creation callback > for other-0,mytopic-2,mytopic-0,other-2,mytopic-1,mytopic-3,other-1,other-3 > (kafka.controller.KafkaController) > [2018-01-09 11:19:08,596] INFO [Controller id=6] New partition creation > callback for > other-0,mytopic-2,mytopic-0,other-2,mytopic-1,mytopic-3,other-1,other-3 > (kafka.controller.KafkaController) > [2018-01-09 11:19:08,596] INFO [PartitionStateMachine controllerId=6] > Invoking state change to NewPartition for partitions > other-0,mytopic-2,mytopic-0,other-2,mytopic-1,mytopic-3,other-1,other-3 > (kafka.controller.PartitionStateMachine) > [2018-01-09 11:19:08,642] INFO [PartitionStateMachine controllerId=6] > Invoking state change to OnlinePartition for partitions > other-0,mytopic-2,mytopic-0,other-2,mytopic-1,mytopic-3,other-1,other-3 > (kafka.controller.PartitionStateMachine) > [2018-01-09 11:19:08,828] INFO [Topic Deletion Manager 6], Partition deletion > callback for mytopic-2,mytopic-0,mytopic-1,mytopic-3 > (kafka.controller.TopicDeletionManager) > [2018-01-09 11:19:09,127] INFO [Controller id=6] New leader and ISR for > partition mytopic-0 is {"leader":-1,"leader_epoch":1,"isr":[]} > (kafka.controller.KafkaController) > [2018-01-09 11:19:09,607] ERROR [controller-event-thread]: Error processing > event TopicDeletion(Set(mytopic, other)) (kafka.controller.Contr > ollerEventManager$ControllerEventThread) > java.util.NoSuchElementException: key not found: mytopic-0 > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:59) > at scala.collection.mutable.HashMap.apply(HashMap.scala:65) > at > kafka.controller.ControllerBrokerRequestBatch.kafka$controller$ControllerBrokerRequestBatch$$updateMetadataRequestPartitionInfo$1(ControllerChannelManager.scala:358) > at > kafka.controller.ControllerBrokerRequestBatch$$anonfun$addUpdateMetadataRequestForBrokers$3.apply(ControllerChannelManager.scala:394) > at > kafka.controller.ControllerBrokerRequestBatch$$anonfun$addUpdateMetadataRequestForBrokers$3.apply(ControllerChannelManager.scala:394) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) > at > kafka.controller.ControllerBrokerRequestBatch.addUpdateMetadataRequestForBrokers(ControllerChannelManager.scala:394) > at > kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:1039) > at > kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onTopicDeletion(TopicDeletionManager.scala:266) > at > kafka.controller.TopicDeletionManager$$anonfun$resumeDeletions$2.apply(TopicDeletionManager.scala:361) > at > kafka.controller.TopicDeletionManager$$anonfun$resumeDeletions$2.apply(TopicDeletionManager.scala:333) > at scala.collection.immutable.Set$Set3.foreach(Set.scala:163) > at > kafka.controller.TopicDeletionManager.resumeDeletions(TopicDeletionManager.scala:333) > at > kafka.controller.TopicDeletionManager.enqueueTopicsForDeletion(TopicDeletionManager.scala:113) > at > kafka.controller.KafkaController$TopicDeletion.process(KafkaController.scala:1299) > at > kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:53) > at > kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:53) > at > kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:53) > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31) > at > kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:52) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64) > [2018-01-09 11:19:09,657] INFO [Controller id=6] New topics: [Set(mytopic)], > deleted topics: [Set()], new partition replica assignment [Map(mytopic-2 -> > Vector(6), mytopic-0 -> Vector(4), mytopic-1 -> Vector(5), mytopic-3 -> > Vector(7))] (kafka.controller.KafkaController) > [2018-01-09 11:19:09,658] INFO [Controller id=6] New topic creation callback > for mytopic-2,mytopic-0,mytopic-1,mytopic-3 (kafka.controller.KafkaController) > [2018-01-09 11:19:09,658] INFO [Controller id=6] New partition creation > callback for mytopic-2,mytopic-0,mytopic-1,mytopic-3 > (kafka.controller.KafkaController) > [2018-01-09 11:19:09,658] INFO [PartitionStateMachine controllerId=6] > Invoking state change to NewPartition for partitions > mytopic-2,mytopic-0,mytopic-1,mytopic-3 > (kafka.controller.PartitionStateMachine) > [2018-01-09 11:19:09,659] INFO [PartitionStateMachine controllerId=6] > Invoking state change to OnlinePartition for partitions > mytopic-2,mytopic-0,mytopic-1,mytopic-3 > (kafka.controller.PartitionStateMachine) > [2018-01-09 11:19:09,664] WARN [OfflinePartitionLeaderSelector]: No broker in > ISR is alive for mytopic-0. Elect leader 4 from live brokers 4. There's > potential data loss. (kafka.controller.OfflinePartitionLeaderSelector) > [2018-01-09 11:19:09,664] INFO [OfflinePartitionLeaderSelector]: Selected new > leader and ISR {"leader":4,"leader_epoch":2,"isr":[4]} for offline partition > mytopic-0 (kafka.controller.OfflinePartitionLeaderSelector) > [2018-01-09 11:19:09,769] ERROR [Controller id=6] Forcing the controller to > resign (kafka.controller.KafkaController) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)