[jira] [Comment Edited] (KAFKA-1887) controller error message on shutting the last broker

2015-02-21 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330093#comment-14330093
 ] 

Sriharsha Chintalapani edited comment on KAFKA-1887 at 2/21/15 5:53 PM:


[~nehanarkhede]  I moved  KafkaController.shutdown() followed by 
KafkaHealthCheck.shutdown() above SocketServer.shutdown().

1) Moving kafkaHealthCheck below controller shutdown doesn't trigger  
ReplicaStateMachine.BrokerChangeListener() 
2) because of 1 controllerContext.controllerChannelManager.removeBroker doesn't 
gets called for the current brokerId  and it continues exist in 
controllerContext.controllerChannelManager.brokerStateInfo.
3) when kafkaController.shutdown() gets called it calls 
controllerChannelManager.shutdown() and it will go through removeExistingBroker 
for the brokerId whose SocketServer is shutdown causing 
removeExistingBroker().brokerStateInfo(brokerId).channel.disconnect() 
throw an exception . Because of this exception KafkaBroker.shutdown() is 
slowing down.

In the above patch moved KafkaController.shutdown and KafkaHealthCheck.shutdown 
above SocketServer.shutdown()


was (Author: sriharsha):
[~nehanarkhede]  I moved  KafkaController.shutdown() followed by 
KafkaHealthCheck.shutdown() above SocketServer.shutdown().

1) Moving kafkaHealthCheck below controller shutdown doesn't trigger  
ReplicaStateMachine.BrokerChangeListener() 
2) because of 1 controllerContext.controllerChannelManager.removeBroker for the 
current brokerId  and it continues exist in 
controllerContext.controllerChannelManager.brokerStateInfo.
3) when kafkaController.shutdown() gets called it calls 
controllerChannelManager.shutdown() and it will go through removeExistingBroker 
for the brokerId whose SocketServer is shutdown causing 
removeExistingBroker().brokerStateInfo(brokerId).channel.disconnect() 
throw an exception . Because of this exception KafkaBroker.shutdown() is 
slowing down.

In the above patch moved KafkaController.shutdown and KafkaHealthCheck.shutdown 
above SocketServer.shutdown()

> controller error message on shutting the last broker
> 
>
> Key: KAFKA-1887
> URL: https://issues.apache.org/jira/browse/KAFKA-1887
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Jun Rao
>Assignee: Sriharsha Chintalapani
>Priority: Minor
> Fix For: 0.8.3
>
> Attachments: KAFKA-1887.patch, KAFKA-1887_2015-02-21_01:12:25.patch
>
>
> We always see the following error in state-change log on shutting down the 
> last broker.
> [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change 
> for partition [test,0] from OfflinePartition to OnlinePartition failed 
> (state.change.logger)
> kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is 
> alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
> at 
> kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
> at 
> kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
> at 
> kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
> at 
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
> at 
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> at 
> kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
> at 
> kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$ano

[jira] [Comment Edited] (KAFKA-1887) controller error message on shutting the last broker

2015-02-18 Thread Helena Edelson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326455#comment-14326455
 ] 

Helena Edelson edited comment on KAFKA-1887 at 2/18/15 7:57 PM:


I see this consistently on shutdown in version 0.8.2.0. Shutting the controller 
down first as a workaround works.


was (Author: helena_e):
I see this consistently on shutdown in version 0.8.2.0

> controller error message on shutting the last broker
> 
>
> Key: KAFKA-1887
> URL: https://issues.apache.org/jira/browse/KAFKA-1887
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Jun Rao
>Assignee: Sriharsha Chintalapani
>Priority: Minor
> Fix For: 0.8.3
>
>
> We always see the following error in state-change log on shutting down the 
> last broker.
> [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change 
> for partition [test,0] from OfflinePartition to OnlinePartition failed 
> (state.change.logger)
> kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is 
> alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
> at 
> kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
> at 
> kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
> at 
> kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
> at 
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
> at 
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> at 
> kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
> at 
> kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
> at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
> at kafka.utils.Utils$.inLock(Utils.scala:535)
> at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
> at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
> at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)