Re: owner info in zk is not correct

2014-05-22 Thread Helin Xiang
Hi Guozhang,

sorry for asking a little unrelated problem here.

We found a consumer stopping fetching data when doing an network upgrade,
if the consumer have connection problem with one broker (but OK with
zookeeper and other brokers)
the fetcherrunnable will stop, but there is no chance to restart the thread
again ( since there is no zookeeper timeout so no reblance will be
triggered)

kafka version is 0.7.2 and we are using high level consumer.

I did a simulation test using iptables and the result is same( no chance to
restart fetcherrunnable).
I have read the code, seems the exception will not be handled outside. Am I
wrong about this?

THANKS.





On Sat, May 17, 2014 at 12:19 AM, Guozhang Wang  wrote:

> Hi Yonghui,
>
> Could you check if consumer2's fetcher thread is still alive? Also we have
> an entry in FAQ wiki page about "consumer stopped consuming", current
> Apache has some issues with the wiki page, but you may want to check it out
> once Apache page resumes.
>
> Guozhang
>
>
> On Wed, May 14, 2014 at 8:59 PM, Yonghui Zhao 
> wrote:
>
> > Thanks GuoZhang.
> >
> > After last accident, we stop all consumers and then restart all consumers
> > one by one then it is ok.
> >
> > 2 brokers,  10 partitions / broker,  3 consumers, each create  10 stream
> >
> > So  consumer1 consumes 10 partitions,  consumer2 consumers another 10
> > partitions,  consumer3 is idle.
> >
> > Today we find some exceptions in consumer2, after these exceptions
> > consumer2 doesn't work, no message is consumed.
> >
> > But in zk I found the owner ship doesn't change,   consumer1 own 10
> > partitions  consumer 2 own another 10 partitions/
> >
> >  How can we avoid this happen again?
> >
> >
> >   From the log we see the error sequence:
> >
> >
> >
> >
> > *exception during commitOffsets, Reconnect in multifetch due to socket
> > error,  rebalance 2 times,   error in FetcherRunnable'Seems the last
> error
> > in FetcherRunnable is deadly, after this error no message is consumed,
> but
> > zk ownership doesn't release.*Here is all the kafka related log in that
> > time
> > *:*
> >
> > [WARN  2014-05-13 16:19:05.020]
> > kafka.utils.Logging$class.warn(Logging.scala:79)
> > [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c exception
> > during commitOffsets]
> > at kafka.utils.ZkUtils$.updatePersistentPath(ZkUtils.scala:103)
> > at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:251)
> > at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:248)
> > at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:248)
> > at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:246)
> > at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:246)
> > at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector.autoCommit(ZookeeperConsumerConnector.scala:232)
> > at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:126)
> > at kafka.utils.Utils$$anon$2.run(Utils.scala:58)
> > [INFO  2014-05-13 16:19:08.991]
> > kafka.utils.Logging$class.info(Logging.scala:61)
> > [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c ZK expired;
> > release old broker parition ownership; re-register consumer
> > RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c]
> > [INFO  2014-05-13 16:19:08.991]
> > kafka.utils.Logging$class.info(Logging.scala:61)
> > [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c begin
> > registering consumer
> > RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c in ZK]
> > [WARN  2014-05-13 16:19:09.001]
> > kafka.utils.Logging$class.warn(Logging.scala:79)
> > [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c exception
> > during commitOffsets]
> > at kafka.utils.ZkUtils$.updatePersistentPath(ZkUtils.scala:103)
> > at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:251)
> > at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:248)
> > at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:248)
> > at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:246)
> > at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:246)
> > at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector.autoCommit(ZookeeperConsumerConnector.scala:232)
> > at
> >
> >
> kafka.consumer.ZookeeperConsumerConnector$$ano

Re: owner info in zk is not correct

2014-05-16 Thread Guozhang Wang
Hi Yonghui,

Could you check if consumer2's fetcher thread is still alive? Also we have
an entry in FAQ wiki page about "consumer stopped consuming", current
Apache has some issues with the wiki page, but you may want to check it out
once Apache page resumes.

Guozhang


On Wed, May 14, 2014 at 8:59 PM, Yonghui Zhao  wrote:

> Thanks GuoZhang.
>
> After last accident, we stop all consumers and then restart all consumers
> one by one then it is ok.
>
> 2 brokers,  10 partitions / broker,  3 consumers, each create  10 stream
>
> So  consumer1 consumes 10 partitions,  consumer2 consumers another 10
> partitions,  consumer3 is idle.
>
> Today we find some exceptions in consumer2, after these exceptions
> consumer2 doesn't work, no message is consumed.
>
> But in zk I found the owner ship doesn't change,   consumer1 own 10
> partitions  consumer 2 own another 10 partitions/
>
>  How can we avoid this happen again?
>
>
>   From the log we see the error sequence:
>
>
>
>
> *exception during commitOffsets, Reconnect in multifetch due to socket
> error,  rebalance 2 times,   error in FetcherRunnable'Seems the last error
> in FetcherRunnable is deadly, after this error no message is consumed, but
> zk ownership doesn't release.*Here is all the kafka related log in that
> time
> *:*
>
> [WARN  2014-05-13 16:19:05.020]
> kafka.utils.Logging$class.warn(Logging.scala:79)
> [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c exception
> during commitOffsets]
> at kafka.utils.ZkUtils$.updatePersistentPath(ZkUtils.scala:103)
> at
>
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:251)
> at
>
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:248)
> at
>
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:248)
> at
>
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:246)
> at
>
> kafka.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:246)
> at
>
> kafka.consumer.ZookeeperConsumerConnector.autoCommit(ZookeeperConsumerConnector.scala:232)
> at
>
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:126)
> at kafka.utils.Utils$$anon$2.run(Utils.scala:58)
> [INFO  2014-05-13 16:19:08.991]
> kafka.utils.Logging$class.info(Logging.scala:61)
> [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c ZK expired;
> release old broker parition ownership; re-register consumer
> RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c]
> [INFO  2014-05-13 16:19:08.991]
> kafka.utils.Logging$class.info(Logging.scala:61)
> [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c begin
> registering consumer
> RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c in ZK]
> [WARN  2014-05-13 16:19:09.001]
> kafka.utils.Logging$class.warn(Logging.scala:79)
> [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c exception
> during commitOffsets]
> at kafka.utils.ZkUtils$.updatePersistentPath(ZkUtils.scala:103)
> at
>
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:251)
> at
>
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:248)
> at
>
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:248)
> at
>
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:246)
> at
>
> kafka.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:246)
> at
>
> kafka.consumer.ZookeeperConsumerConnector.autoCommit(ZookeeperConsumerConnector.scala:232)
> at
>
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:126)
> at kafka.utils.Utils$$anon$2.run(Utils.scala:58)
> [INFO  2014-05-13 16:19:09.002]
> kafka.utils.Logging$class.info(Logging.scala:61)
> [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c end registering
> consumer RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c in ZK]
> [INFO  2014-05-13 16:19:09.003]
> kafka.utils.Logging$class.info(Logging.scala:61)
> [RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c begin
> rebalancing consumer
> RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c try #0]
> [INFO  2014-05-13 16:19:09.063]
> kafka.utils.Logging$class.info(Logging.scala:69)
> [Reconnect in multifetch due to socket error: ]
> at kafka.utils.Utils$.read(Utils.scala:538)
> at
>
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
> at kafka.network.Receive$class.readCompletely(Transmission.scala:55)
> at

Re: owner info in zk is not correct

2014-05-15 Thread Yonghui Zhao
Thanks GuoZhang.

After last accident, we stop all consumers and then restart all consumers
one by one then it is ok.

2 brokers,  10 partitions / broker,  3 consumers, each create  10 stream

So  consumer1 consumes 10 partitions,  consumer2 consumers another 10
partitions,  consumer3 is idle.

Today we find some exceptions in consumer2, after these exceptions
consumer2 doesn't work, no message is consumed.

But in zk I found the owner ship doesn't change,   consumer1 own 10
partitions  consumer 2 own another 10 partitions/

 How can we avoid this happen again?


  From the log we see the error sequence:




*exception during commitOffsets, Reconnect in multifetch due to socket
error,  rebalance 2 times,   error in FetcherRunnable'Seems the last error
in FetcherRunnable is deadly, after this error no message is consumed, but
zk ownership doesn't release.*Here is all the kafka related log in that time
*:*

[WARN  2014-05-13 16:19:05.020]
kafka.utils.Logging$class.warn(Logging.scala:79)
[RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c exception
during commitOffsets]
at kafka.utils.ZkUtils$.updatePersistentPath(ZkUtils.scala:103)
at
kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:251)
at
kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:248)
at
kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:248)
at
kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:246)
at
kafka.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:246)
at
kafka.consumer.ZookeeperConsumerConnector.autoCommit(ZookeeperConsumerConnector.scala:232)
at
kafka.consumer.ZookeeperConsumerConnector$$anonfun$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:126)
at kafka.utils.Utils$$anon$2.run(Utils.scala:58)
[INFO  2014-05-13 16:19:08.991]
kafka.utils.Logging$class.info(Logging.scala:61)
[RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c ZK expired;
release old broker parition ownership; re-register consumer
RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c]
[INFO  2014-05-13 16:19:08.991]
kafka.utils.Logging$class.info(Logging.scala:61)
[RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c begin
registering consumer
RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c in ZK]
[WARN  2014-05-13 16:19:09.001]
kafka.utils.Logging$class.warn(Logging.scala:79)
[RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c exception
during commitOffsets]
at kafka.utils.ZkUtils$.updatePersistentPath(ZkUtils.scala:103)
at
kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:251)
at
kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:248)
at
kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:248)
at
kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$3.apply(ZookeeperConsumerConnector.scala:246)
at
kafka.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:246)
at
kafka.consumer.ZookeeperConsumerConnector.autoCommit(ZookeeperConsumerConnector.scala:232)
at
kafka.consumer.ZookeeperConsumerConnector$$anonfun$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:126)
at kafka.utils.Utils$$anon$2.run(Utils.scala:58)
[INFO  2014-05-13 16:19:09.002]
kafka.utils.Logging$class.info(Logging.scala:61)
[RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c end registering
consumer RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c in ZK]
[INFO  2014-05-13 16:19:09.003]
kafka.utils.Logging$class.info(Logging.scala:61)
[RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c begin
rebalancing consumer
RecommendEvent_sd-sns-relation01.bj-1399968348749-4bc8451c try #0]
[INFO  2014-05-13 16:19:09.063]
kafka.utils.Logging$class.info(Logging.scala:69)
[Reconnect in multifetch due to socket error: ]
at kafka.utils.Utils$.read(Utils.scala:538)
at
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
at kafka.network.Receive$class.readCompletely(Transmission.scala:55)
at
kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
at kafka.consumer.SimpleConsumer.getResponse(SimpleConsumer.scala:177)
at kafka.consumer.SimpleConsumer.liftedTree2$1(SimpleConsumer.scala:117)
at kafka.consumer.SimpleConsumer.multifetch(SimpleConsumer.scala:115)
at kafka.consumer.FetcherRunnable.run(FetcherRunnable.scala:60)
[INFO  2014-05-13 16:19:09.068]
kafka.utils.Logging$class.info(Logging.scala:61)
[FecherRunnable Thread[FetchRunnable-0,5,main] interrupted]
[INFO  2014-05-

Re: owner info in zk is not correct

2014-05-12 Thread Guozhang Wang
Hello Yonghui,

In 0.7 the consumer rebalance logic is distributed and in some corner cases
such as soft-failure-caused-consecutive rebalances some consumer may
consider the rebalance as complete while others are still trying the
rebalance process. You can check the GC logs on your consumer to verify if
that is the case:

https://issues.apache.org/jira/browse/KAFKA-242

If you bounce the consumers to trigger another rebalance, this issue would
likely to be resolved.

To solve this issue in 0.9 we are moving the group management like load
rebalance from the ZK-based distributed logic into a centralized
coordiantor. Details can be found here:

https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design

Guozhang


On Mon, May 12, 2014 at 12:48 AM, Yonghui Zhao wrote:

> Hi,
>
> We are using kafka 0.7.
>
> 2 brokers, each broker has 10 partitions for one topic
> 3 consumers in one consumer group, each consumer create 10 streams.
>
>
> Today, when we want to rollout new service.
> After we restart one consumer we find exceptions and warning.
>
> kafka.common.ConsumerRebalanceFailedException:
> RecommendEvent_sd-sns-relation01.bj-1399630465426-53d3aefc can't rebalance
> after 4 retries
>
>
> [INFO  2014-05-12 15:17:47.364]
> kafka.utils.Logging$class.info(Logging.scala:61)
> [conflict in /consumers/RecommendEvent/owners/sensei/1-2 data:
> RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e-2 stored data:
> RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1]
> [INFO  2014-05-12 15:17:47.366]
> kafka.utils.Logging$class.info(Logging.scala:61)
> [RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e waiting for the
> partition ownership to be deleted: 1-2]
> [INFO  2014-05-12 15:17:47.375]
> kafka.utils.Logging$class.info(Logging.scala:61)
> [conflict in /consumers/RecommendEvent/owners/sensei/1-3 data:
> RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e-3 stored data:
> RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1]
> [INFO  2014-05-12 15:17:47.375]
> kafka.utils.Logging$class.info(Logging.scala:61)
> [RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e waiting for the
> partition ownership to be deleted: 1-3]
> [INFO  2014-05-12 15:17:47.385]
> kafka.utils.Logging$class.info(Logging.scala:61)
> [conflict in /consumers/RecommendEvent/owners/sensei/1-5 data:
> RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e-5 stored data:
> RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2]
> [INFO  2014-05-12 15:17:47.386]
> kafka.utils.Logging$class.info(Logging.scala:61)
> [RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e waiting for the
> partition ownership to be deleted: 1-5]
>
>
>
> And I opened zk viewer.
>
> In zk, we found 2 consumers in ConsumerGroup/ids:
>
> RecommendEvent_sd-sns-relation02.bj-1399635256619-5d8123c6
> RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3
>
>
> And in owners/topic/ we found all partitions are assigned to
> sd-sns-relation03.bj:
>
> Here is the owner info:
> 1:0  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-0
> 1:1  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-0
> 1:2  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1
> 1:3  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1
> 1:4  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2
> 1:5  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2
> 1:6  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-3
> 1:7  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-3
> 1:8  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-4
> 1:9  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-4
>
> 2:0  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-0
> 2:1  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1
> 2:2  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2
> 2:3  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-3
> 2:4  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-4
> 2:5  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-5
> 2:6  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-6
> 2:7  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-7
> 2:8  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-8
> 2:9  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-9
>
>
> So all partitions are assigned to sd-sns-relation03.bj,  but from logs and
> counter, we are sure sd-sns-relation02.bj has input too.
>
>
> My question is:
>
> 1. why rebalance failed?
> 2. why owner info is wrong?  btw: zkclient is 0.2
>



-- 
-- Guozhang


owner info in zk is not correct

2014-05-12 Thread Yonghui Zhao
Hi,

We are using kafka 0.7.

2 brokers, each broker has 10 partitions for one topic
3 consumers in one consumer group, each consumer create 10 streams.


Today, when we want to rollout new service.
After we restart one consumer we find exceptions and warning.

kafka.common.ConsumerRebalanceFailedException:
RecommendEvent_sd-sns-relation01.bj-1399630465426-53d3aefc can't rebalance
after 4 retries


[INFO  2014-05-12 15:17:47.364]
kafka.utils.Logging$class.info(Logging.scala:61)
[conflict in /consumers/RecommendEvent/owners/sensei/1-2 data:
RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e-2 stored data:
RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1]
[INFO  2014-05-12 15:17:47.366]
kafka.utils.Logging$class.info(Logging.scala:61)
[RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e waiting for the
partition ownership to be deleted: 1-2]
[INFO  2014-05-12 15:17:47.375]
kafka.utils.Logging$class.info(Logging.scala:61)
[conflict in /consumers/RecommendEvent/owners/sensei/1-3 data:
RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e-3 stored data:
RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1]
[INFO  2014-05-12 15:17:47.375]
kafka.utils.Logging$class.info(Logging.scala:61)
[RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e waiting for the
partition ownership to be deleted: 1-3]
[INFO  2014-05-12 15:17:47.385]
kafka.utils.Logging$class.info(Logging.scala:61)
[conflict in /consumers/RecommendEvent/owners/sensei/1-5 data:
RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e-5 stored data:
RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2]
[INFO  2014-05-12 15:17:47.386]
kafka.utils.Logging$class.info(Logging.scala:61)
[RecommendEvent_sd-sns-relation01.bj-1399879066480-5426fb5e waiting for the
partition ownership to be deleted: 1-5]



And I opened zk viewer.

In zk, we found 2 consumers in ConsumerGroup/ids:

RecommendEvent_sd-sns-relation02.bj-1399635256619-5d8123c6
RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3


And in owners/topic/ we found all partitions are assigned to
sd-sns-relation03.bj:

Here is the owner info:
1:0  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-0
1:1  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-0
1:2  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1
1:3  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1
1:4  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2
1:5  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2
1:6  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-3
1:7  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-3
1:8  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-4
1:9  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-4

2:0  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-0
2:1  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-1
2:2  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-2
2:3  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-3
2:4  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-4
2:5  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-5
2:6  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-6
2:7  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-7
2:8  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-8
2:9  RecommendEvent_sd-sns-relation03.bj-1399635121250-487bdbb3-9


So all partitions are assigned to sd-sns-relation03.bj,  but from logs and
counter, we are sure sd-sns-relation02.bj has input too.


My question is:

1. why rebalance failed?
2. why owner info is wrong?  btw: zkclient is 0.2