[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker

2015-04-26 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513159#comment-14513159
 ] 

Neha Narkhede commented on KAFKA-1887:
--

[~sriharsha] The patch is waiting for a comment. If you can squeeze that in, 
I'll help you merge this change. 

 controller error message on shutting the last broker
 

 Key: KAFKA-1887
 URL: https://issues.apache.org/jira/browse/KAFKA-1887
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
Assignee: Sriharsha Chintalapani
Priority: Minor
 Fix For: 0.8.3

 Attachments: KAFKA-1887.patch, KAFKA-1887_2015-02-21_01:12:25.patch


 We always see the following error in state-change log on shutting down the 
 last broker.
 [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change 
 for partition [test,0] from OfflinePartition to OnlinePartition failed 
 (state.change.logger)
 kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is 
 alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
 at 
 kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
 at 
 kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
 at 
 kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
 at 
 kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
 at 
 kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at kafka.utils.Utils$.inLock(Utils.scala:535)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
 at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
 at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker

2015-02-21 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330090#comment-14330090
 ] 

Sriharsha Chintalapani commented on KAFKA-1887:
---

Created reviewboard https://reviews.apache.org/r/31256/diff/
 against branch origin/trunk

 controller error message on shutting the last broker
 

 Key: KAFKA-1887
 URL: https://issues.apache.org/jira/browse/KAFKA-1887
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
Assignee: Sriharsha Chintalapani
Priority: Minor
 Fix For: 0.8.3

 Attachments: KAFKA-1887.patch


 We always see the following error in state-change log on shutting down the 
 last broker.
 [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change 
 for partition [test,0] from OfflinePartition to OnlinePartition failed 
 (state.change.logger)
 kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is 
 alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
 at 
 kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
 at 
 kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
 at 
 kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
 at 
 kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
 at 
 kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at kafka.utils.Utils$.inLock(Utils.scala:535)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
 at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
 at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker

2015-02-21 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330094#comment-14330094
 ] 

Sriharsha Chintalapani commented on KAFKA-1887:
---

Updated reviewboard https://reviews.apache.org/r/31256/diff/
 against branch origin/trunk

 controller error message on shutting the last broker
 

 Key: KAFKA-1887
 URL: https://issues.apache.org/jira/browse/KAFKA-1887
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
Assignee: Sriharsha Chintalapani
Priority: Minor
 Fix For: 0.8.3

 Attachments: KAFKA-1887.patch, KAFKA-1887_2015-02-21_01:12:25.patch


 We always see the following error in state-change log on shutting down the 
 last broker.
 [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change 
 for partition [test,0] from OfflinePartition to OnlinePartition failed 
 (state.change.logger)
 kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is 
 alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
 at 
 kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
 at 
 kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
 at 
 kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
 at 
 kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
 at 
 kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at kafka.utils.Utils$.inLock(Utils.scala:535)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
 at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
 at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker

2015-02-21 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330093#comment-14330093
 ] 

Sriharsha Chintalapani commented on KAFKA-1887:
---

[~nehanarkhede]  I moved  KafkaController.shutdown() followed by 
KafkaHealthCheck.shutdown() above SocketServer.shutdown().

1) Moving kafkaHealthCheck below controller shutdown doesn't trigger  
ReplicaStateMachine.BrokerChangeListener() 
2) because of 1 controllerContext.controllerChannelManager.removeBroker for the 
current brokerId  and it continues exist in 
controllerContext.controllerChannelManager.brokerStateInfo.
3) when kafkaController.shutdown() gets called it calls 
controllerChannelManager.shutdown() and it will go through removeExistingBroker 
for the brokerId whose SocketServer is shutdown causing 
removeExistingBroker().brokerStateInfo(brokerId).channel.disconnect() 
throw an exception . Because of this exception KafkaBroker.shutdown() is 
slowing down.

In the above patch moved KafkaController.shutdown and KafkaHealthCheck.shutdown 
above SocketServer.shutdown()

 controller error message on shutting the last broker
 

 Key: KAFKA-1887
 URL: https://issues.apache.org/jira/browse/KAFKA-1887
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
Assignee: Sriharsha Chintalapani
Priority: Minor
 Fix For: 0.8.3

 Attachments: KAFKA-1887.patch


 We always see the following error in state-change log on shutting down the 
 last broker.
 [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change 
 for partition [test,0] from OfflinePartition to OnlinePartition failed 
 (state.change.logger)
 kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is 
 alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
 at 
 kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
 at 
 kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
 at 
 kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
 at 
 kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
 at 
 kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at kafka.utils.Utils$.inLock(Utils.scala:535)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
 at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
 at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker

2015-02-20 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329610#comment-14329610
 ] 

Gwen Shapira commented on KAFKA-1887:
-

Regarding ProducerFailureHandlingTest.testNotEnoughReplicasAfterBrokerShutdown:

1. Feel free to change the test to pass with either exception - they are both 
correct behaviors for this test.

2. Since I didn't expect the shutdown order change to have any effect on the 
test, I'll take a look at what happened.

 controller error message on shutting the last broker
 

 Key: KAFKA-1887
 URL: https://issues.apache.org/jira/browse/KAFKA-1887
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
Assignee: Sriharsha Chintalapani
Priority: Minor
 Fix For: 0.8.3


 We always see the following error in state-change log on shutting down the 
 last broker.
 [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change 
 for partition [test,0] from OfflinePartition to OnlinePartition failed 
 (state.change.logger)
 kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is 
 alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
 at 
 kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
 at 
 kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
 at 
 kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
 at 
 kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
 at 
 kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at kafka.utils.Utils$.inLock(Utils.scala:535)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
 at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
 at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker

2015-02-20 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329102#comment-14329102
 ] 

Neha Narkhede commented on KAFKA-1887:
--

I think this fix exposed a bug in the new consumer - KAFKA-1969. The fix is 
simple though.

 controller error message on shutting the last broker
 

 Key: KAFKA-1887
 URL: https://issues.apache.org/jira/browse/KAFKA-1887
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
Assignee: Sriharsha Chintalapani
Priority: Minor
 Fix For: 0.8.3


 We always see the following error in state-change log on shutting down the 
 last broker.
 [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change 
 for partition [test,0] from OfflinePartition to OnlinePartition failed 
 (state.change.logger)
 kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is 
 alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
 at 
 kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
 at 
 kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
 at 
 kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
 at 
 kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
 at 
 kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at kafka.utils.Utils$.inLock(Utils.scala:535)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
 at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
 at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker

2015-02-20 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329122#comment-14329122
 ] 

Sriharsha Chintalapani commented on KAFKA-1887:
---

[~nehanarkhede] Another problem I forgot add in my comment above with this fix 
i.e moving kafkaHealthCheck below the controller shutdown causing broker 
controlled shutdown to fail . Here is the log on a 2 node cluster with 2 
partitions, 2 replication factor topic.
This happens after I shutdown second broker and next shutdown the controller
I'll get more details on this. 
[2015-02-20 08:24:34,548] INFO [Kafka Server 0], Starting controlled shutdown 
(kafka.server.KafkaServer)
[2015-02-20 08:24:34,569] INFO [Kafka Server 0], Remaining partitions to move: 
[test3,0],[test3,1] (kafka.server.KafkaServer)
[2015-02-20 08:24:34,570] INFO [Kafka Server 0], Error code from controller: 0 
(kafka.server.KafkaServer)
[2015-02-20 08:24:39,575] WARN [Kafka Server 0], Retrying controlled shutdown 
after the previous attempt failed... (kafka.server.KafkaServer)
[2015-02-20 08:24:39,596] INFO [Kafka Server 0], Remaining partitions to move: 
[test3,0],[test3,1] (kafka.server.KafkaServer)
[2015-02-20 08:24:39,596] INFO [Kafka Server 0], Error code from controller: 0 
(kafka.server.KafkaServer)
^C[2015-02-20 08:24:44,598] WARN [Kafka Server 0], Retrying controlled shutdown 
after the previous attempt failed... (kafka.server.KafkaServer)
[2015-02-20 08:24:44,617] INFO [Kafka Server 0], Remaining partitions to move: 
[test3,0],[test3,1] (kafka.server.KafkaServer)
[2015-02-20 08:24:44,617] INFO [Kafka Server 0], Error code from controller: 0 
(kafka.server.KafkaServer)
[2015-02-20 08:24:49,620] WARN [Kafka Server 0], Retrying controlled shutdown 
after the previous attempt failed... (kafka.server.KafkaServer)
[2015-02-20 08:24:49,621] INFO Closing socket connection to /192.168.202.1. 
(kafka.network.Processor)
[2015-02-20 08:24:49,621] WARN [Kafka Server 0], Proceeding to do an unclean 
shutdown as all the controlled shutdown attempts failed 
(kafka.server.KafkaServer)


 controller error message on shutting the last broker
 

 Key: KAFKA-1887
 URL: https://issues.apache.org/jira/browse/KAFKA-1887
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
Assignee: Sriharsha Chintalapani
Priority: Minor
 Fix For: 0.8.3


 We always see the following error in state-change log on shutting down the 
 last broker.
 [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change 
 for partition [test,0] from OfflinePartition to OnlinePartition failed 
 (state.change.logger)
 kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is 
 alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
 at 
 kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
 at 
 kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
 at 
 kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
 at 
 kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
 at 
 kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at 

[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker

2015-02-20 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329318#comment-14329318
 ] 

Sriharsha Chintalapani commented on KAFKA-1887:
---

This can be fixed by moving KafkaHealthcheck.shutdown() after 
controller.shutdown() as suggested by [~nehanarkhede] [~gwenshap] . We also 
need to move kafkaHealthCheck.start() before controller.start() otherwise we 
will see the same error in state-change.log after broker started.  
The below patch causing the unit tests run time to go from 9min 30 sec to 
15mins on my machine and also causing intermittent test failure in 
ProducerFailureHandlingTest.testNotEnoughReplicasAfterBrokerShutdown  as 
producer.send.get gets an NotEnoughReplicasAfterAppendException instead of 
NotEnoughReplicasException (probably not related to this patch). I am looking 
into the test slowness with the patch but if you have any idea/fix for this 
please go ahead and take the jira I don't want to hold up 0.8.2.1 release. I'll 
update the jira as soon as I've a fix. 
{code}
+/* tell everyone we are alive */
+kafkaHealthcheck = new KafkaHealthcheck(config.brokerId, 
config.advertisedHostName, config.advertisedPort, config.zkSessionTimeoutMs, 
zkClient)
+kafkaHealthcheck.startup()
+
 /* start kafka controller */
 kafkaController = new KafkaController(config, zkClient, brokerState)
 kafkaController.startup()
@@ -152,10 +156,6 @@ class KafkaServer(val config: KafkaConfig, time: Time = 
SystemTime) extends Logg
 topicConfigManager = new TopicConfigManager(zkClient, logManager)
 topicConfigManager.startup()
 
-/* tell everyone we are alive */
-kafkaHealthcheck = new KafkaHealthcheck(config.brokerId, 
config.advertisedHostName, config.advertisedPort, config.zkSessionTimeoutMs, 
zkClient)
-kafkaHealthcheck.startup()
-
 /* register broker metrics */
 registerStats()
 
@@ -310,8 +310,6 @@ class KafkaServer(val config: KafkaConfig, time: Time = 
SystemTime) extends Logg
   if (canShutdown) {
 Utils.swallow(controlledShutdown())
 brokerState.newState(BrokerShuttingDown)
-if(kafkaHealthcheck != null)
-  Utils.swallow(kafkaHealthcheck.shutdown())
 if(socketServer != null)
   Utils.swallow(socketServer.shutdown())
 if(requestHandlerPool != null)
@@ -329,6 +327,8 @@ class KafkaServer(val config: KafkaConfig, time: Time = 
SystemTime) extends Logg
   Utils.swallow(consumerCoordinator.shutdown())
 if(kafkaController != null)
   Utils.swallow(kafkaController.shutdown())
+if(kafkaHealthcheck != null)
+  Utils.swallow(kafkaHealthcheck.shutdown())
 if(zkClient != null)
   Utils.swallow(zkClient.close())

{code}

 controller error message on shutting the last broker
 

 Key: KAFKA-1887
 URL: https://issues.apache.org/jira/browse/KAFKA-1887
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
Assignee: Sriharsha Chintalapani
Priority: Minor
 Fix For: 0.8.3


 We always see the following error in state-change log on shutting down the 
 last broker.
 [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change 
 for partition [test,0] from OfflinePartition to OnlinePartition failed 
 (state.change.logger)
 kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is 
 alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
 at 
 kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
 at 
 kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
 at 
 kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)

[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker

2015-02-19 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328572#comment-14328572
 ] 

Neha Narkhede commented on KAFKA-1887:
--

Agree this error is really annoying. It will be great to squeeze this in 0.8.2.1
[~sriharsha] This might help-
{code}
   if (canShutdown) {
 Utils.swallow(controlledShutdown())
 brokerState.newState(BrokerShuttingDown)
-if(kafkaHealthcheck != null)
-  Utils.swallow(kafkaHealthcheck.shutdown())
 if(socketServer != null)
   Utils.swallow(socketServer.shutdown())
 if(requestHandlerPool != null)
@@ -329,6 +327,8 @@ class KafkaServer(val config: KafkaConfig, time: Time = 
SystemTime) extends Logg
   Utils.swallow(consumerCoordinator.shutdown())
 if(kafkaController != null)
   Utils.swallow(kafkaController.shutdown())
+if(kafkaHealthcheck != null)
+  Utils.swallow(kafkaHealthcheck.shutdown())
 if(zkClient != null)
   Utils.swallow(zkClient.close())
{code}

 controller error message on shutting the last broker
 

 Key: KAFKA-1887
 URL: https://issues.apache.org/jira/browse/KAFKA-1887
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
Assignee: Sriharsha Chintalapani
Priority: Minor
 Fix For: 0.8.3


 We always see the following error in state-change log on shutting down the 
 last broker.
 [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change 
 for partition [test,0] from OfflinePartition to OnlinePartition failed 
 (state.change.logger)
 kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is 
 alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
 at 
 kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
 at 
 kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
 at 
 kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
 at 
 kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
 at 
 kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at kafka.utils.Utils$.inLock(Utils.scala:535)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
 at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
 at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker

2015-02-18 Thread Helena Edelson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326455#comment-14326455
 ] 

Helena Edelson commented on KAFKA-1887:
---

I see this consistently on shutdown in version 0.8.2.0

 controller error message on shutting the last broker
 

 Key: KAFKA-1887
 URL: https://issues.apache.org/jira/browse/KAFKA-1887
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
Assignee: Sriharsha Chintalapani
Priority: Minor
 Fix For: 0.8.3


 We always see the following error in state-change log on shutting down the 
 last broker.
 [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change 
 for partition [test,0] from OfflinePartition to OnlinePartition failed 
 (state.change.logger)
 kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is 
 alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
 at 
 kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
 at 
 kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
 at 
 kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
 at 
 kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
 at 
 kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at kafka.utils.Utils$.inLock(Utils.scala:535)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
 at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
 at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker

2015-02-12 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318312#comment-14318312
 ] 

Sriharsha Chintalapani commented on KAFKA-1887:
---

[~gwenshap] I am worried about moving kafkaHealthCheck below as it won't be 
able to trigger onBrokerFailure in KafkaController. In a cluster this probably 
fine as there will be another controller gets elected. Also this change causes 
ProducerFailureHandlingTest.testNotEnoughReplicasAfterBrokerShutdown to fail as 
producer.send.get gets an NotEnoughReplicasAfterAppendException instead of 
NotEnoughReplicasException

 controller error message on shutting the last broker
 

 Key: KAFKA-1887
 URL: https://issues.apache.org/jira/browse/KAFKA-1887
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
Assignee: Sriharsha Chintalapani
Priority: Minor
 Fix For: 0.8.3


 We always see the following error in state-change log on shutting down the 
 last broker.
 [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change 
 for partition [test,0] from OfflinePartition to OnlinePartition failed 
 (state.change.logger)
 kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is 
 alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
 at 
 kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
 at 
 kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
 at 
 kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
 at 
 kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
 at 
 kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at kafka.utils.Utils$.inLock(Utils.scala:535)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
 at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
 at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker

2015-02-11 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317695#comment-14317695
 ] 

Gwen Shapira commented on KAFKA-1887:
-

[~harsha_ch] I think the idea was simply to move kafkaHealthCheck.shutdown() to 
after kafkaController.shutdown() in the broker shutdown sequence. This happens 
after the ControlShutdownRequest, so it should be ok, right?


 controller error message on shutting the last broker
 

 Key: KAFKA-1887
 URL: https://issues.apache.org/jira/browse/KAFKA-1887
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
Assignee: Sriharsha Chintalapani
Priority: Minor
 Fix For: 0.8.3


 We always see the following error in state-change log on shutting down the 
 last broker.
 [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change 
 for partition [test,0] from OfflinePartition to OnlinePartition failed 
 (state.change.logger)
 kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is 
 alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
 at 
 kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
 at 
 kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
 at 
 kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
 at 
 kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
 at 
 kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at kafka.utils.Utils$.inLock(Utils.scala:535)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
 at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
 at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker

2015-02-11 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317678#comment-14317678
 ] 

Sriharsha Chintalapani commented on KAFKA-1887:
---

[~junrao] 
Shutting down the controller before shutting down broker is not a good idea as 
broker does a controlledShutdown which sends ControllShutdownRequest and there 
won't be any controller handling these requests. More over if we don't shutdown 
broker first and if it happens to be the last broker in the cluster it won't go 
through onBrokerFailover which will move the partitions to offlinestate.
I am not sure if there is good way to fix this as it does go through the 
expected steps. Please let me know if you have any ideas. 

 controller error message on shutting the last broker
 

 Key: KAFKA-1887
 URL: https://issues.apache.org/jira/browse/KAFKA-1887
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
Assignee: Sriharsha Chintalapani
Priority: Minor
 Fix For: 0.8.3


 We always see the following error in state-change log on shutting down the 
 last broker.
 [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change 
 for partition [test,0] from OfflinePartition to OnlinePartition failed 
 (state.change.logger)
 kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is 
 alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
 at 
 kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
 at 
 kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357)
 at 
 kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
 at 
 kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
 at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
 at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
 at 
 kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
 at 
 kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
 at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
 at kafka.utils.Utils$.inLock(Utils.scala:535)
 at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
 at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
 at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)