Re: owner info in zk is not correct
Hi Guozhang, sorry for asking a little unrelated problem here. We found a consumer stopping fetching data when doing an network upgrade, if the consumer have connection problem with one broker (but OK with zookeeper and other brokers) the fetcherrunnable will stop, but there is no chance to restart the thread again ( since there is no zookeeper timeout so no reblance will be triggered) kafka version is 0.7.2 and we are using high level consumer. I did a simulation test using iptables and the result is same( no chance to restart fetcherrunnable). I have read the code, seems the exception will not be handled outside. Am I wrong about this? THANKS. On Sat, May 17, 2014 at 12:19 AM, Guozhang Wang wrote: > Hi Yonghui, > > Could you check if consumer2's fetcher thread is still alive? Also we have > an entry in FAQ wiki page about "consumer stopped consuming", current > Apache has some issues with the wiki page, but you may want to check it out > once Apache page resumes. > > Guozhang > > > On Wed, May 14, 2014 at 8:59 PM, Yonghui Zhao > wrote: > > > Thanks GuoZhang. > > > > After last accident, we stop all consumers and then restart all consumers > > one by one then it is ok. > > > > 2 brokers, 10 partitions / broker, 3 consumers, each create 10 stream > > > > So consumer1 consumes 10 partitions, consumer2 consumers another 10 > > partitions, consumer3 is idle. > > > > Today we find some exceptions in consumer2, after these exceptions > > consumer2 doesn't work, no message is consumed. > > > > But in zk I found the owner ship doesn't change, consumer1 own 10 > > partitions consumer 2 own another 10 partitions/ > > > > How can we avoid this happen again? > > > > > > From the log we see the error sequence: > > > > > > > > > > *exception during commitOffsets, Reconnect in multifetch due to socket > > error, rebalance 2 times, error in FetcherRunnable'Seems the last > error > > in FetcherRunnable is deadly, after this error no message is consumed, > but > > zk ownership doesn't release.*Here is all the kafka related log in that > > time > > *:* > > > > [WARN 2014-05-13 16:19:05.020] > > kafka.utils.Logging$class.warn(Logging.scala:79) > > [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c exception > > during commitOffsets] > > at kafka.utils.ZkUtils$.updatePersistentPath(ZkUtils.scala:103) > > at > > > > > kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:251) > > at > > > > > kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:248) > > at > > > > > kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:248) > > at > > > > > kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:246) > > at > > > > > kafka.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:246) > > at > > > > > kafka.consumer.ZookeeperConsumerConnector.autoCommit(ZookeeperConsumerConnector.scala:232) > > at > > > > > kafka.consumer.ZookeeperConsumerConnector$$anonfun$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:126) > > at kafka.utils.Utils$$anon$2.run(Utils.scala:58) > > [INFO 2014-05-13 16:19:08.991] > > kafka.utils.Logging$class.info(Logging.scala:61) > > [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c ZK expired; > > release old broker parition ownership; re-register consumer > > RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c] > > [INFO 2014-05-13 16:19:08.991] > > kafka.utils.Logging$class.info(Logging.scala:61) > > [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c begin > > registering consumer > > RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c in ZK] > > [WARN 2014-05-13 16:19:09.001] > > kafka.utils.Logging$class.warn(Logging.scala:79) > > [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c exception > > during commitOffsets] > > at kafka.utils.ZkUtils$.updatePersistentPath(ZkUtils.scala:103) > > at > > > > > kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:251) > > at > > > > > kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:248) > > at > > > > > kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:248) > > at > > > > > kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:246) > > at > > > > > kafka.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:246) > > at > > > > > kafka.consumer.ZookeeperConsumerConnector.autoCommit(ZookeeperConsumerConnector.scala:232) > > at > > > > > kafka.consumer.ZookeeperConsumerConnector$$ano
Re: owner info in zk is not correct
Hi Yonghui, Could you check if consumer2's fetcher thread is still alive? Also we have an entry in FAQ wiki page about "consumer stopped consuming", current Apache has some issues with the wiki page, but you may want to check it out once Apache page resumes. Guozhang On Wed, May 14, 2014 at 8:59 PM, Yonghui Zhao wrote: > Thanks GuoZhang. > > After last accident, we stop all consumers and then restart all consumers > one by one then it is ok. > > 2 brokers, 10 partitions / broker, 3 consumers, each create 10 stream > > So consumer1 consumes 10 partitions, consumer2 consumers another 10 > partitions, consumer3 is idle. > > Today we find some exceptions in consumer2, after these exceptions > consumer2 doesn't work, no message is consumed. > > But in zk I found the owner ship doesn't change, consumer1 own 10 > partitions consumer 2 own another 10 partitions/ > > How can we avoid this happen again? > > > From the log we see the error sequence: > > > > > *exception during commitOffsets, Reconnect in multifetch due to socket > error, rebalance 2 times, error in FetcherRunnable'Seems the last error > in FetcherRunnable is deadly, after this error no message is consumed, but > zk ownership doesn't release.*Here is all the kafka related log in that > time > *:* > > [WARN 2014-05-13 16:19:05.020] > kafka.utils.Logging$class.warn(Logging.scala:79) > [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c exception > during commitOffsets] > at kafka.utils.ZkUtils$.updatePersistentPath(ZkUtils.scala:103) > at > > kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:251) > at > > kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:248) > at > > kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:248) > at > > kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:246) > at > > kafka.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:246) > at > > kafka.consumer.ZookeeperConsumerConnector.autoCommit(ZookeeperConsumerConnector.scala:232) > at > > kafka.consumer.ZookeeperConsumerConnector$$anonfun$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:126) > at kafka.utils.Utils$$anon$2.run(Utils.scala:58) > [INFO 2014-05-13 16:19:08.991] > kafka.utils.Logging$class.info(Logging.scala:61) > [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c ZK expired; > release old broker parition ownership; re-register consumer > RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c] > [INFO 2014-05-13 16:19:08.991] > kafka.utils.Logging$class.info(Logging.scala:61) > [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c begin > registering consumer > RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c in ZK] > [WARN 2014-05-13 16:19:09.001] > kafka.utils.Logging$class.warn(Logging.scala:79) > [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c exception > during commitOffsets] > at kafka.utils.ZkUtils$.updatePersistentPath(ZkUtils.scala:103) > at > > kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:251) > at > > kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:248) > at > > kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:248) > at > > kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:246) > at > > kafka.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:246) > at > > kafka.consumer.ZookeeperConsumerConnector.autoCommit(ZookeeperConsumerConnector.scala:232) > at > > kafka.consumer.ZookeeperConsumerConnector$$anonfun$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:126) > at kafka.utils.Utils$$anon$2.run(Utils.scala:58) > [INFO 2014-05-13 16:19:09.002] > kafka.utils.Logging$class.info(Logging.scala:61) > [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c end registering > consumer RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c in ZK] > [INFO 2014-05-13 16:19:09.003] > kafka.utils.Logging$class.info(Logging.scala:61) > [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c begin > rebalancing consumer > RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c try #0] > [INFO 2014-05-13 16:19:09.063] > kafka.utils.Logging$class.info(Logging.scala:69) > [Reconnect in multifetch due to socket error: ] > at kafka.utils.Utils$.read(Utils.scala:538) > at > > kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) > at kafka.network.Receive$class.readCompletely(Transmission.scala:55) > at
Re: owner info in zk is not correct
Thanks GuoZhang. After last accident, we stop all consumers and then restart all consumers one by one then it is ok. 2 brokers, 10 partitions / broker, 3 consumers, each create 10 stream So consumer1 consumes 10 partitions, consumer2 consumers another 10 partitions, consumer3 is idle. Today we find some exceptions in consumer2, after these exceptions consumer2 doesn't work, no message is consumed. But in zk I found the owner ship doesn't change, consumer1 own 10 partitions consumer 2 own another 10 partitions/ How can we avoid this happen again? From the log we see the error sequence: *exception during commitOffsets, Reconnect in multifetch due to socket error, rebalance 2 times, error in FetcherRunnable'Seems the last error in FetcherRunnable is deadly, after this error no message is consumed, but zk ownership doesn't release.*Here is all the kafka related log in that time *:* [WARN 2014-05-13 16:19:05.020] kafka.utils.Logging$class.warn(Logging.scala:79) [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c exception during commitOffsets] at kafka.utils.ZkUtils$.updatePersistentPath(ZkUtils.scala:103) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:251) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:248) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:248) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:246) at kafka.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:246) at kafka.consumer.ZookeeperConsumerConnector.autoCommit(ZookeeperConsumerConnector.scala:232) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:126) at kafka.utils.Utils$$anon$2.run(Utils.scala:58) [INFO 2014-05-13 16:19:08.991] kafka.utils.Logging$class.info(Logging.scala:61) [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c ZK expired; release old broker parition ownership; re-register consumer RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c] [INFO 2014-05-13 16:19:08.991] kafka.utils.Logging$class.info(Logging.scala:61) [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c begin registering consumer RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c in ZK] [WARN 2014-05-13 16:19:09.001] kafka.utils.Logging$class.warn(Logging.scala:79) [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c exception during commitOffsets] at kafka.utils.ZkUtils$.updatePersistentPath(ZkUtils.scala:103) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:251) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:248) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:248) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:246) at kafka.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:246) at kafka.consumer.ZookeeperConsumerConnector.autoCommit(ZookeeperConsumerConnector.scala:232) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:126) at kafka.utils.Utils$$anon$2.run(Utils.scala:58) [INFO 2014-05-13 16:19:09.002] kafka.utils.Logging$class.info(Logging.scala:61) [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c end registering consumer RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c in ZK] [INFO 2014-05-13 16:19:09.003] kafka.utils.Logging$class.info(Logging.scala:61) [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c begin rebalancing consumer RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c try #0] [INFO 2014-05-13 16:19:09.063] kafka.utils.Logging$class.info(Logging.scala:69) [Reconnect in multifetch due to socket error: ] at kafka.utils.Utils$.read(Utils.scala:538) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Receive$class.readCompletely(Transmission.scala:55) at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29) at kafka.consumer.SimpleConsumer.getResponse(SimpleConsumer.scala:177) at kafka.consumer.SimpleConsumer.liftedTree2$1(SimpleConsumer.scala:117) at kafka.consumer.SimpleConsumer.multifetch(SimpleConsumer.scala:115) at kafka.consumer.FetcherRunnable.run(FetcherRunnable.scala:60) [INFO 2014-05-13 16:19:09.068] kafka.utils.Logging$class.info(Logging.scala:61) [FecherRunnable Thread[FetchRunnable-0,5,main] interrupted] [INFO 2014-05-
Re: owner info in zk is not correct
Hello Yonghui, In 0.7 the consumer rebalance logic is distributed and in some corner cases such as soft-failure-caused-consecutive rebalances some consumer may consider the rebalance as complete while others are still trying the rebalance process. You can check the GC logs on your consumer to verify if that is the case: https://issues.apache.org/jira/browse/KAFKA-242 If you bounce the consumers to trigger another rebalance, this issue would likely to be resolved. To solve this issue in 0.9 we are moving the group management like load rebalance from the ZK-based distributed logic into a centralized coordiantor. Details can be found here: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design Guozhang On Mon, May 12, 2014 at 12:48 AM, Yonghui Zhao wrote: > Hi, > > We are using kafka 0.7. > > 2 brokers, each broker has 10 partitions for one topic > 3 consumers in one consumer group, each consumer create 10 streams. > > > Today, when we want to rollout new service. > After we restart one consumer we find exceptions and warning. > > kafka.common.ConsumerRebalanceFailedException: > RecommendEvent_sd-sns-relation01.bj-1399630465426-53d3aefc can't rebalance > after 4 retries > > > [INFO 2014-05-12 15:17:47.364] > kafka.utils.Logging$class.info(Logging.scala:61) > [conflict in /consumers/RecommendEvent/owners/sensei/1-2 data: > RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e-2 stored data: > RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1] > [INFO 2014-05-12 15:17:47.366] > kafka.utils.Logging$class.info(Logging.scala:61) > [RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e waiting for the > partition ownership to be deleted: 1-2] > [INFO 2014-05-12 15:17:47.375] > kafka.utils.Logging$class.info(Logging.scala:61) > [conflict in /consumers/RecommendEvent/owners/sensei/1-3 data: > RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e-3 stored data: > RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1] > [INFO 2014-05-12 15:17:47.375] > kafka.utils.Logging$class.info(Logging.scala:61) > [RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e waiting for the > partition ownership to be deleted: 1-3] > [INFO 2014-05-12 15:17:47.385] > kafka.utils.Logging$class.info(Logging.scala:61) > [conflict in /consumers/RecommendEvent/owners/sensei/1-5 data: > RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e-5 stored data: > RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2] > [INFO 2014-05-12 15:17:47.386] > kafka.utils.Logging$class.info(Logging.scala:61) > [RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e waiting for the > partition ownership to be deleted: 1-5] > > > > And I opened zk viewer. > > In zk, we found 2 consumers in ConsumerGroup/ids: > > RecommendEvent_sd-sns-relation02.bj-1399635256619-5d8123c6 > RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3 > > > And in owners/topic/ we found all partitions are assigned to > sd-sns-relation03.bj: > > Here is the owner info: > 1:0 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-0 > 1:1 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-0 > 1:2 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1 > 1:3 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1 > 1:4 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2 > 1:5 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2 > 1:6 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-3 > 1:7 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-3 > 1:8 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-4 > 1:9 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-4 > > 2:0 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-0 > 2:1 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1 > 2:2 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2 > 2:3 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-3 > 2:4 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-4 > 2:5 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-5 > 2:6 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-6 > 2:7 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-7 > 2:8 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-8 > 2:9 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-9 > > > So all partitions are assigned to sd-sns-relation03.bj, but from logs and > counter, we are sure sd-sns-relation02.bj has input too. > > > My question is: > > 1. why rebalance failed? > 2. why owner info is wrong? btw: zkclient is 0.2 > -- -- Guozhang
owner info in zk is not correct
Hi, We are using kafka 0.7. 2 brokers, each broker has 10 partitions for one topic 3 consumers in one consumer group, each consumer create 10 streams. Today, when we want to rollout new service. After we restart one consumer we find exceptions and warning. kafka.common.ConsumerRebalanceFailedException: RecommendEvent_sd-sns-relation01.bj-1399630465426-53d3aefc can't rebalance after 4 retries [INFO 2014-05-12 15:17:47.364] kafka.utils.Logging$class.info(Logging.scala:61) [conflict in /consumers/RecommendEvent/owners/sensei/1-2 data: RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e-2 stored data: RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1] [INFO 2014-05-12 15:17:47.366] kafka.utils.Logging$class.info(Logging.scala:61) [RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e waiting for the partition ownership to be deleted: 1-2] [INFO 2014-05-12 15:17:47.375] kafka.utils.Logging$class.info(Logging.scala:61) [conflict in /consumers/RecommendEvent/owners/sensei/1-3 data: RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e-3 stored data: RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1] [INFO 2014-05-12 15:17:47.375] kafka.utils.Logging$class.info(Logging.scala:61) [RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e waiting for the partition ownership to be deleted: 1-3] [INFO 2014-05-12 15:17:47.385] kafka.utils.Logging$class.info(Logging.scala:61) [conflict in /consumers/RecommendEvent/owners/sensei/1-5 data: RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e-5 stored data: RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2] [INFO 2014-05-12 15:17:47.386] kafka.utils.Logging$class.info(Logging.scala:61) [RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e waiting for the partition ownership to be deleted: 1-5] And I opened zk viewer. In zk, we found 2 consumers in ConsumerGroup/ids: RecommendEvent_sd-sns-relation02.bj-1399635256619-5d8123c6 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3 And in owners/topic/ we found all partitions are assigned to sd-sns-relation03.bj: Here is the owner info: 1:0 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-0 1:1 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-0 1:2 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1 1:3 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1 1:4 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2 1:5 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2 1:6 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-3 1:7 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-3 1:8 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-4 1:9 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-4 2:0 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-0 2:1 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1 2:2 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2 2:3 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-3 2:4 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-4 2:5 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-5 2:6 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-6 2:7 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-7 2:8 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-8 2:9 RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-9 So all partitions are assigned to sd-sns-relation03.bj, but from logs and counter, we are sure sd-sns-relation02.bj has input too. My question is: 1. why rebalance failed? 2. why owner info is wrong? btw: zkclient is 0.2