[ https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330093#comment-14330093 ]
Sriharsha Chintalapani edited comment on KAFKA-1887 at 2/21/15 5:53 PM: ------------------------------------------------------------------------ [~nehanarkhede] I moved KafkaController.shutdown() followed by KafkaHealthCheck.shutdown() above SocketServer.shutdown(). 1) Moving kafkaHealthCheck below controller shutdown doesn't trigger ReplicaStateMachine.BrokerChangeListener() 2) because of 1 controllerContext.controllerChannelManager.removeBroker doesn't gets called for the current brokerId and it continues exist in controllerContext.controllerChannelManager.brokerStateInfo. 3) when kafkaController.shutdown() gets called it calls controllerChannelManager.shutdown() and it will go through removeExistingBroker for the brokerId whose SocketServer is shutdown causing removeExistingBroker().brokerStateInfo(brokerId).channel.disconnect() throw an exception . Because of this exception KafkaBroker.shutdown() is slowing down. In the above patch moved KafkaController.shutdown and KafkaHealthCheck.shutdown above SocketServer.shutdown() was (Author: sriharsha): [~nehanarkhede] I moved KafkaController.shutdown() followed by KafkaHealthCheck.shutdown() above SocketServer.shutdown(). 1) Moving kafkaHealthCheck below controller shutdown doesn't trigger ReplicaStateMachine.BrokerChangeListener() 2) because of 1 controllerContext.controllerChannelManager.removeBroker for the current brokerId and it continues exist in controllerContext.controllerChannelManager.brokerStateInfo. 3) when kafkaController.shutdown() gets called it calls controllerChannelManager.shutdown() and it will go through removeExistingBroker for the brokerId whose SocketServer is shutdown causing removeExistingBroker().brokerStateInfo(brokerId).channel.disconnect() throw an exception . Because of this exception KafkaBroker.shutdown() is slowing down. In the above patch moved KafkaController.shutdown and KafkaHealthCheck.shutdown above SocketServer.shutdown() > controller error message on shutting the last broker > ---------------------------------------------------- > > Key: KAFKA-1887 > URL: https://issues.apache.org/jira/browse/KAFKA-1887 > Project: Kafka > Issue Type: Bug > Components: core > Reporter: Jun Rao > Assignee: Sriharsha Chintalapani > Priority: Minor > Fix For: 0.8.3 > > Attachments: KAFKA-1887.patch, KAFKA-1887_2015-02-21_01:12:25.patch > > > We always see the following error in state-change log on shutting down the > last broker. > [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change > for partition [test,0] from OfflinePartition to OnlinePartition failed > (state.change.logger) > kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is > alive. Live brokers are: [Set()], Assigned replicas are: [List(0)] > at > kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75) > at > kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357) > at > kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206) > at > kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120) > at > kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > at > kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117) > at > kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) > at kafka.utils.Utils$.inLock(Utils.scala:535) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356) > at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568) > at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) -- This message was sent by Atlassian JIRA (v6.3.4#6332)