[ 
https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329122#comment-14329122
 ] 

Sriharsha Chintalapani commented on KAFKA-1887:
-----------------------------------------------

[~nehanarkhede] Another problem I forgot add in my comment above with this fix 
i.e moving kafkaHealthCheck below the controller shutdown causing broker 
controlled shutdown to fail . Here is the log on a 2 node cluster with 2 
partitions, 2 replication factor topic.
This happens after I shutdown second broker and next shutdown the controller
I'll get more details on this. 
[2015-02-20 08:24:34,548] INFO [Kafka Server 0], Starting controlled shutdown 
(kafka.server.KafkaServer)
[2015-02-20 08:24:34,569] INFO [Kafka Server 0], Remaining partitions to move: 
[test3,0],[test3,1] (kafka.server.KafkaServer)
[2015-02-20 08:24:34,570] INFO [Kafka Server 0], Error code from controller: 0 
(kafka.server.KafkaServer)
[2015-02-20 08:24:39,575] WARN [Kafka Server 0], Retrying controlled shutdown 
after the previous attempt failed... (kafka.server.KafkaServer)
[2015-02-20 08:24:39,596] INFO [Kafka Server 0], Remaining partitions to move: 
[test3,0],[test3,1] (kafka.server.KafkaServer)
[2015-02-20 08:24:39,596] INFO [Kafka Server 0], Error code from controller: 0 
(kafka.server.KafkaServer)
^C[2015-02-20 08:24:44,598] WARN [Kafka Server 0], Retrying controlled shutdown 
after the previous attempt failed... (kafka.server.KafkaServer)
[2015-02-20 08:24:44,617] INFO [Kafka Server 0], Remaining partitions to move: 
[test3,0],[test3,1] (kafka.server.KafkaServer)
[2015-02-20 08:24:44,617] INFO [Kafka Server 0], Error code from controller: 0 
(kafka.server.KafkaServer)
[2015-02-20 08:24:49,620] WARN [Kafka Server 0], Retrying controlled shutdown 
after the previous attempt failed... (kafka.server.KafkaServer)
[2015-02-20 08:24:49,621] INFO Closing socket connection to /192.168.202.1. 
(kafka.network.Processor)
[2015-02-20 08:24:49,621] WARN [Kafka Server 0], Proceeding to do an unclean 
shutdown as all the controlled shutdown attempts failed 
(kafka.server.KafkaServer)


> controller error message on shutting the last broker
> ----------------------------------------------------
>
>                 Key: KAFKA-1887
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1887
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Jun Rao
>            Assignee: Sriharsha Chintalapani
>            Priority: Minor
>             Fix For: 0.8.3
>
>
> We always see the following error in state-change log on shutting down the 
> last broker.
> [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change 
> for partition [test,0] from OfflinePartition to OnlinePartition failed 
> (state.change.logger)
> kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is 
> alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
>         at 
> kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
>         at 
> kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
>         at 
> kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
>         at 
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
>         at 
> kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
>         at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>         at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>         at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>         at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>         at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>         at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>         at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>         at 
> kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
>         at 
> kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
>         at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
>         at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
>         at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
>         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>         at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
>         at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
>         at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
>         at kafka.utils.Utils$.inLock(Utils.scala:535)
>         at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
>         at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
>         at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to