[ https://issues.apache.org/jira/browse/KAFKA-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Olson updated KAFKA-3228: -------------------------------- Description: After adding about new 20 brokers to double the size of an existing production Kafka deployment, when attempting to rebalance partitions we were initially unable to reassign any partitions to 5 of the 20. There was no problem with the other 15. The controller broker logged error messages like: {noformat} ERROR kafka.controller.KafkaController: [Controller 19]: Error completing reassignment of partition [TOPIC-NAME,2] kafka.common.KafkaException: Only 4,33 replicas out of the new set of replicas 4,34,33 for partition [TOPIC-NAME,2] to be reassigned are alive. Failing partition reassignment at kafka.controller.KafkaController.initiateReassignReplicasForTopicPartition(KafkaController.scala:611) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply$mcV$sp(KafkaController.scala:1203) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1196) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1195) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at kafka.controller.PartitionsReassignedListener.handleDataChange(KafkaController.scala:1195) at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:751) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) {noformat} We reattempted the reassignment to one of these new brokers, with the same result. We also saw these messages in the controller's log. There was a "Broken pipe" error for each of the new brokers. {noformat} 2016-02-09 12:13:22,082 WARN kafka.controller.RequestSendThread: [Controller-19-to-broker-34-send-thread], Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) at sun.nio.ch.IOUtil.write(IOUtil.java:148) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504) at java.nio.channels.SocketChannel.write(SocketChannel.java:502) at kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56) at kafka.network.Send$class.writeCompletely(Transmission.scala:75) at kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26) at kafka.network.BlockingChannel.send(BlockingChannel.scala:103) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {noformat} {noformat} WARN kafka.controller.RequestSendThread: [Controller-19-to-broker-34-send-thread], Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... to broker id:34... Reconnecting to broker. java.io.EOFException: Received -1 when reading from channel, socket has likely been closed. at kafka.utils.Utils$.read(Utils.scala:381) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Receive$class.readCompletely(Transmission.scala:56) at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29) at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:133) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {noformat} {noformat} INFO kafka.controller.RequestSendThread: [Controller-19-to-broker-34-send-thread], Controller 19 connected to id:34... for sending state change requests {noformat} There were no error messages in the new broker log files, just the normal startup logs. A jstack did not reveal anything unusual with the threads, and using netstat the network connections looked normal. We're running version 0.8.2.1. The new brokers were simultaneously started using a broadcast-style command. However we also had the same issue with a different Kafka cluster after starting up the new brokers individually about 30 seconds apart. After stopping and restarting the 5 problematic new brokers, the reassignment was then successful, and they are now functioning normally. was: After adding about new 20 brokers to double the size of an existing production Kafka deployment, when attempting to rebalance partitions we were initially unable to reassign any partitions to 5 of the 20. There was no problem with the other 15. The controller broker logged error messages like: {noformat} ERROR kafka.controller.KafkaController: [Controller 19]: Error completing reassignment of partition [TOPIC-NAME,2] kafka.common.KafkaException: Only 4,33 replicas out of the new set of replicas 4,34,33 for partition [TOPIC-NAME,2] to be reassigned are alive. Failing partition reassignment at kafka.controller.KafkaController.initiateReassignReplicasForTopicPartition(KafkaController.scala:611) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply$mcV$sp(KafkaController.scala:1203) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1196) at kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1195) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at kafka.controller.PartitionsReassignedListener.handleDataChange(KafkaController.scala:1195) at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:751) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) {noformat} We reattempted the reassignment to one of these new brokers, with the same result. We also saw these messages in the controller's log. There was a "Broken pipe" error for each of the new brokers. {noformat} 2016-02-09 12:13:22,082 WARN kafka.controller.RequestSendThread: [Controller-19-to-broker-34-send-thread], Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) at sun.nio.ch.IOUtil.write(IOUtil.java:148) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504) at java.nio.channels.SocketChannel.write(SocketChannel.java:502) at kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56) at kafka.network.Send$class.writeCompletely(Transmission.scala:75) at kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26) at kafka.network.BlockingChannel.send(BlockingChannel.scala:103) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {noformat} {noformat} WARN kafka.controller.RequestSendThread: [Controller-19-to-broker-34-send-thread], Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... to broker id:34... Reconnecting to broker. java.io.EOFException: Received -1 when reading from channel, socket has likely been closed. at kafka.utils.Utils$.read(Utils.scala:381) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Receive$class.readCompletely(Transmission.scala:56) at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29) at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:133) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {noformat} {noformat} INFO kafka.controller.RequestSendThread: [Controller-19-to-broker-34-send-thread], Controller 19 connected to id:34... for sending state change requests {noformat} There were no error messages in the new broker log files, just the normal startup logs. A jstack did not reveal anything unusual with the threads, and using netstat the network connections looked normal. We're running version 0.8.2.1. The new brokers were simultaneously started using a broadcast-style command. However we also had the same issue with a different Kafka cluster after starting up the new brokers individually about 30 seconds apart. After stopping and restarting the new brokers, the reassignment was then successful, and they are now functioning normally. > Partition reassignment failure for brokers freshly added to cluster > ------------------------------------------------------------------- > > Key: KAFKA-3228 > URL: https://issues.apache.org/jira/browse/KAFKA-3228 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 0.8.2.1 > Reporter: Andrew Olson > Assignee: Neha Narkhede > > After adding about new 20 brokers to double the size of an existing > production Kafka deployment, when attempting to rebalance partitions we were > initially unable to reassign any partitions to 5 of the 20. There was no > problem with the other 15. The controller broker logged error messages like: > {noformat} > ERROR kafka.controller.KafkaController: [Controller 19]: Error completing > reassignment of partition [TOPIC-NAME,2] > kafka.common.KafkaException: Only 4,33 replicas out of the new set of > replicas 4,34,33 for partition [TOPIC-NAME,2] > to be reassigned are alive. Failing partition reassignment > at > kafka.controller.KafkaController.initiateReassignReplicasForTopicPartition(KafkaController.scala:611) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply$mcV$sp(KafkaController.scala:1203) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4$$anonfun$apply$6.apply(KafkaController.scala:1197) > at kafka.utils.Utils$.inLock(Utils.scala:535) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1196) > at > kafka.controller.PartitionsReassignedListener$$anonfun$handleDataChange$4.apply(KafkaController.scala:1195) > at > scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) > at > scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) > at > kafka.controller.PartitionsReassignedListener.handleDataChange(KafkaController.scala:1195) > at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:751) > at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > {noformat} > We reattempted the reassignment to one of these new brokers, with the same > result. > We also saw these messages in the controller's log. There was a "Broken pipe" > error for each of the new brokers. > {noformat} > 2016-02-09 12:13:22,082 WARN kafka.controller.RequestSendThread: > [Controller-19-to-broker-34-send-thread], > Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) > at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) > at sun.nio.ch.IOUtil.write(IOUtil.java:148) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504) > at java.nio.channels.SocketChannel.write(SocketChannel.java:502) > at > kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56) > at kafka.network.Send$class.writeCompletely(Transmission.scala:75) > at > kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26) > at kafka.network.BlockingChannel.send(BlockingChannel.scala:103) > at > kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) > {noformat} > {noformat} > WARN kafka.controller.RequestSendThread: > [Controller-19-to-broker-34-send-thread], > Controller 19 epoch 28 fails to send request Name:UpdateMetadataRequest... to > broker id:34... > Reconnecting to broker. > java.io.EOFException: Received -1 when reading from channel, socket has > likely been closed. > at kafka.utils.Utils$.read(Utils.scala:381) > at > kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) > at kafka.network.Receive$class.readCompletely(Transmission.scala:56) > at > kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29) > at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111) > at > kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:133) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) > {noformat} > {noformat} > INFO kafka.controller.RequestSendThread: > [Controller-19-to-broker-34-send-thread], Controller 19 connected > to id:34... for sending state change requests > {noformat} > There were no error messages in the new broker log files, just the normal > startup logs. A jstack did not reveal anything unusual with the threads, and > using netstat the network connections looked normal. > We're running version 0.8.2.1. The new brokers were simultaneously started > using a broadcast-style command. However we also had the same issue with a > different Kafka cluster after starting up the new brokers individually about > 30 seconds apart. > After stopping and restarting the 5 problematic new brokers, the reassignment > was then successful, and they are now functioning normally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)