Hi Weide
The consumer rebalancing algorithm is deterministic. In your failure scenario,
when A comes back up again, the consumer threads will rebalance. This will give
you the initial consumer configuration at the start of the test.
I'm unsure whether the partitions are balanced round robin, or if they will all
go to A, then the overflow to B.
If all of the messages need to be processed by a single machine, an alternative
architecture would be to have a standby server that waits until master A fails
and then connects as a consumer. This could be accomplished by watching
Zookeeper and getting a notification when A's ephemeral node is removed.
The high level consumer does seem to be the way to go as long as your
application can handle duplicate processing.
Daniel.
On 2/08/2014, at 1:38 pm, Weide Zhang weo...@gmail.com wrote:
Hi Guozhang,
If I use high level consumer, how do I ensure all data goes to master even
if slave was up and running ? Is it just by forcing master to have enough
consumer thread to cover maximum number of partitions of a topic since
high level consumer doesn't have assumption of consumers who are master and
consumers who are slave.
For example, master A initiate enough thread such that it can cover all the
partitions. slave B is standby with same consumer group and same number of
threads but since master A has enough thread to cover all the partitions.
Slave B won't get any data.
Suddenly master A goes down, slave B becomes new master, and it start to
get data based on high level consumer rebalance design.
After that old master A comes up and becomes slave, will A get data ? Or A
will not get data because B has enough thread to cover all partitions in
the rebalancing logic.
Thanks,
Weide
On Fri, Aug 1, 2014 at 4:45 PM, Guozhang Wang wangg...@gmail.com wrote:
Hello Weide,
That should be doable via high-level consumer, you can take a look at this
page:
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
Guozhang
On Fri, Aug 1, 2014 at 3:20 PM, Weide Zhang weo...@gmail.com wrote:
Hi,
I have a use case for a master slave cluster where the logic inside
master
need to consume data from kafka and publish some aggregated data to kafka
again. When master dies, slave need to take the latest committed offset
from master and continue consuming the data from kafka and doing the
push.
My questions is what will be easiest kafka consumer design for this
scenario to work ? I was thinking about using simpleconsumer and doing
manual consumer offset syncing between master and slave. That seems to
solve the problem but I was wondering if it can be achieved by using high
level consumer client ?
Thanks,
Weide
--
-- Guozhang