Re: kafka consumer fail over

2014-08-03 Thread Daniel Compton
Hi Weide

The consumer rebalancing algorithm is deterministic. In your failure scenario, 
when A comes back up again, the consumer threads will rebalance. This will give 
you the initial consumer configuration at the start of the test. 

I'm unsure whether the partitions are balanced round robin, or if they will all 
go to A, then the overflow to B. 

If all of the messages need to be processed by a single machine, an alternative 
architecture would be to have a standby server that waits until master A fails 
and then connects as a consumer. This could be accomplished by watching 
Zookeeper and getting a notification when A's ephemeral node is removed. 

The high level consumer does seem to be the way to go as long as your 
application can handle duplicate processing. 

Daniel.

 On 2/08/2014, at 1:38 pm, Weide Zhang weo...@gmail.com wrote:
 
 Hi Guozhang,
 
 If I use high level consumer, how do I ensure all data goes to master even
 if slave was up and running ? Is it just by forcing master to have enough
 consumer thread to cover maximum number of partitions of a topic  since
 high level consumer doesn't have assumption of consumers who are master and
 consumers who are slave.
 
 For example, master A initiate enough thread such that it can cover all the
 partitions. slave B is standby with same consumer group and same number of
 threads but since master A has enough thread to cover all the partitions.
 Slave B won't get any data.
 
 Suddenly master A goes down, slave B becomes new master, and it start to
 get data based on high level consumer rebalance design.
 
 After that old master A comes up and becomes slave, will A get data ?  Or A
 will not get data because B has enough thread to cover all partitions in
 the rebalancing logic.
 
 Thanks,
 
 Weide
 
 
 On Fri, Aug 1, 2014 at 4:45 PM, Guozhang Wang wangg...@gmail.com wrote:
 
 Hello Weide,
 
 That should be doable via high-level consumer, you can take a look at this
 page:
 
 https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
 
 Guozhang
 
 
 On Fri, Aug 1, 2014 at 3:20 PM, Weide Zhang weo...@gmail.com wrote:
 
 Hi,
 
 I have a use case for a master slave  cluster where the logic inside
 master
 need to consume data from kafka and publish some aggregated data to kafka
 again. When master dies, slave need to take the latest committed offset
 from master and continue consuming the data from kafka and doing the
 push.
 
 My questions is what will be easiest kafka consumer design for this
 scenario to work ? I was thinking about using simpleconsumer and doing
 manual consumer offset syncing between master and slave. That seems to
 solve the problem but I was wondering if it can be achieved by using high
 level consumer client ?
 
 Thanks,
 
 Weide
 
 
 
 --
 -- Guozhang
 


Re: kafka consumer fail over

2014-08-01 Thread Guozhang Wang
Hello Weide,

That should be doable via high-level consumer, you can take a look at this
page:

https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example

Guozhang


On Fri, Aug 1, 2014 at 3:20 PM, Weide Zhang weo...@gmail.com wrote:

 Hi,

 I have a use case for a master slave  cluster where the logic inside master
 need to consume data from kafka and publish some aggregated data to kafka
 again. When master dies, slave need to take the latest committed offset
 from master and continue consuming the data from kafka and doing the push.

 My questions is what will be easiest kafka consumer design for this
 scenario to work ? I was thinking about using simpleconsumer and doing
 manual consumer offset syncing between master and slave. That seems to
 solve the problem but I was wondering if it can be achieved by using high
 level consumer client ?

 Thanks,

 Weide




-- 
-- Guozhang