Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-29 Thread Jun Rao
What's the output of the ConsumerOffsetChecker tool?

Thanks,

Jun

On Tue, Oct 28, 2014 at 7:31 AM, Natarajan, Murugavel 
murugavel.natara...@softwareag.com wrote:

 Hi,

 I have the following Kafka Setup
 Number of producer : 1
 Number of topics : 1
 Number of partitions : 2
 Number of consumers : 3 (with same group id)
 Number of Kafka cluster : none(single Kafka server)
 Zookeeper.session.timeout : 1000

 Producer produces messages without any specific partitioning logic(default
 partitioning logic). Consumer 1 consumes message continuously. I am
 abruptly killing consumer 1 and I would except consumer 2 or consumer 3 to
 consume the messages after the failure of consumer 1.
 In some cases rebalance occurs and consumer 2 starts consuming messages.
 This is perfectly fine. But in some cases either consumer 2 or consumer 3
 is not at all consuming. I have to manually kill all the consumers and
 start all three consumers again. Only after this restart consumer 1 starts
 consuming again.
 Precisely rebalance is successful in some cases while in some cases
 rebalance is not successful. Is there any configuration that I am missing.

 Cheers and Regards,
 Murugavel .




Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-28 Thread Natarajan, Murugavel
Hi,

I have the following Kafka Setup
Number of producer : 1
Number of topics : 1
Number of partitions : 2
Number of consumers : 3 (with same group id)
Number of Kafka cluster : none(single Kafka server)
Zookeeper.session.timeout : 1000

Producer produces messages without any specific partitioning logic(default 
partitioning logic). Consumer 1 consumes message continuously. I am abruptly 
killing consumer 1 and I would except consumer 2 or consumer 3 to consume the 
messages after the failure of consumer 1.
In some cases rebalance occurs and consumer 2 starts consuming messages. This 
is perfectly fine. But in some cases either consumer 2 or consumer 3 is not at 
all consuming. I have to manually kill all the consumers and start all three 
consumers again. Only after this restart consumer 1 starts consuming again.
Precisely rebalance is successful in some cases while in some cases rebalance 
is not successful. Is there any configuration that I am missing.

Cheers and Regards,
Murugavel .



Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-09 Thread Neha Narkhede
With SimpleConsumer, you will have to handle leader discovery as well as
zookeeper based rebalancing. You can see an example here -
https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example

On Wed, Oct 8, 2014 at 11:45 AM, Sharninder sharnin...@gmail.com wrote:

 Thanks Gwen. This really helped.

 Yes, Kafka is the best thing ever :)

 Now how would this be done with the Simple consumer? I'm guessing I'll have
 to maintain my own state in Zookeeper or something of that sort?


 On Thu, Oct 9, 2014 at 12:01 AM, Gwen Shapira gshap...@cloudera.com
 wrote:

  Here's an example (from ConsumerOffsetChecker tool) of 1 topic (t1)
  and 1 consumer group (flume), each of the 3 topic partitions is being
  read by a different machine running the flume consumer:
  Group   Topic  Pid Offset
  logSize Lag Owner
  flume   t1 0   50172068
  100210042   50037974
  flume_kafkacdh-1.ent.cloudera.com-1412722833783-3d6d80db-0
  flume   t1 1   49914701
  499147010
  flume_kafkacdh-2.ent.cloudera.com-1412722838536-a6a4915d-0
  flume   t1 2   54218841
  8273338028514539
  flume_kafkacdh-3.ent.cloudera.com-1412722832793-b23eaa63-0
 
  If flume_kafkacdh-1 crashed, another broker will pick up the partition:
  Group   Topic  Pid Offset
  logSize Lag Owner
  flume   t1 0   59669715
  100210042   40540327
  flume_kafkacdh-2.ent.cloudera.com-1412792880818-b4aa6feb-0
  flume   t1 1   49914701
  499147010
  flume_kafkacdh-2.ent.cloudera.com-1412792880818-b4aa6feb-0
  flume   t1 2   65796205
  8273338016937175
  flume_kafkacdh-3.ent.cloudera.com-1412792871089-cabd4934-0
 
  Then I can start flume_kafkacdh-4 and see things rebalance again:
  flume   t1 0   60669715
  100210042   39540327
  flume_kafkacdh-2.ent.cloudera.com-1412792880818-b4aa6feb-0
  flume   t1 1   49914701
  499147010
  flume_kafkacdh-3.ent.cloudera.com-1412792871089-cabd4934-0
  flume   t1 2   66829740
  8273338015903640
  flume_kafkacdh-4.ent.cloudera.com-1412793053882-9bfddff9-0
 
  Isn't Kafka the best thing ever? :)
 
  Gwen
 
  On Wed, Oct 8, 2014 at 11:23 AM, Gwen Shapira gshap...@cloudera.com
  wrote:
   yep. exactly.
  
   On Wed, Oct 8, 2014 at 11:07 AM, Sharninder sharnin...@gmail.com
  wrote:
   Thanks Gwen.
  
   When you're saying that I can add consumers to the same group, does
 that
   also hold true if those consumers are running on different machines?
 Or
  in
   different JVMs?
  
   --
   Sharninder
  
  
   On Wed, Oct 8, 2014 at 11:35 PM, Gwen Shapira gshap...@cloudera.com
  wrote:
  
   If you use the high level consumer implementation, and register all
   consumers as part of the same group - they will load-balance
   automatically.
  
   When you add a consumer to the group, if there are enough partitions
   in the topic, some of the partitions will be assigned to the new
   consumer.
   When a consumer crashes, once its node in ZK times out, other
   consumers will get its partitions.
  
   Gwen
  
   On Wed, Oct 8, 2014 at 10:39 AM, Sharninder sharnin...@gmail.com
  wrote:
Hi,
   
I'm not even sure if this is a valid use-case, but I really wanted
  to run
it by you guys. How do I load balance my consumers? For example, if
  my
consumer machine is under load, I'd like to spin up another VM with
   another
consumer process to keep reading messages off any topic. On similar
   lines,
how do you guys handle consumer failures? Suppose one consumer
  process
   gets
an exception and crashes, is it possible for me to somehow make
 sure
  that
there is another process that is still reading the queue for me?
   
--
Sharninder
  
 



Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-08 Thread Sharninder
Thanks Gwen.

When you're saying that I can add consumers to the same group, does that
also hold true if those consumers are running on different machines? Or in
different JVMs?

--
Sharninder


On Wed, Oct 8, 2014 at 11:35 PM, Gwen Shapira gshap...@cloudera.com wrote:

 If you use the high level consumer implementation, and register all
 consumers as part of the same group - they will load-balance
 automatically.

 When you add a consumer to the group, if there are enough partitions
 in the topic, some of the partitions will be assigned to the new
 consumer.
 When a consumer crashes, once its node in ZK times out, other
 consumers will get its partitions.

 Gwen

 On Wed, Oct 8, 2014 at 10:39 AM, Sharninder sharnin...@gmail.com wrote:
  Hi,
 
  I'm not even sure if this is a valid use-case, but I really wanted to run
  it by you guys. How do I load balance my consumers? For example, if my
  consumer machine is under load, I'd like to spin up another VM with
 another
  consumer process to keep reading messages off any topic. On similar
 lines,
  how do you guys handle consumer failures? Suppose one consumer process
 gets
  an exception and crashes, is it possible for me to somehow make sure that
  there is another process that is still reading the queue for me?
 
  --
  Sharninder



Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-08 Thread Gwen Shapira
If you use the high level consumer implementation, and register all
consumers as part of the same group - they will load-balance
automatically.

When you add a consumer to the group, if there are enough partitions
in the topic, some of the partitions will be assigned to the new
consumer.
When a consumer crashes, once its node in ZK times out, other
consumers will get its partitions.

Gwen

On Wed, Oct 8, 2014 at 10:39 AM, Sharninder sharnin...@gmail.com wrote:
 Hi,

 I'm not even sure if this is a valid use-case, but I really wanted to run
 it by you guys. How do I load balance my consumers? For example, if my
 consumer machine is under load, I'd like to spin up another VM with another
 consumer process to keep reading messages off any topic. On similar lines,
 how do you guys handle consumer failures? Suppose one consumer process gets
 an exception and crashes, is it possible for me to somehow make sure that
 there is another process that is still reading the queue for me?

 --
 Sharninder


Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-08 Thread Gwen Shapira
yep. exactly.

On Wed, Oct 8, 2014 at 11:07 AM, Sharninder sharnin...@gmail.com wrote:
 Thanks Gwen.

 When you're saying that I can add consumers to the same group, does that
 also hold true if those consumers are running on different machines? Or in
 different JVMs?

 --
 Sharninder


 On Wed, Oct 8, 2014 at 11:35 PM, Gwen Shapira gshap...@cloudera.com wrote:

 If you use the high level consumer implementation, and register all
 consumers as part of the same group - they will load-balance
 automatically.

 When you add a consumer to the group, if there are enough partitions
 in the topic, some of the partitions will be assigned to the new
 consumer.
 When a consumer crashes, once its node in ZK times out, other
 consumers will get its partitions.

 Gwen

 On Wed, Oct 8, 2014 at 10:39 AM, Sharninder sharnin...@gmail.com wrote:
  Hi,
 
  I'm not even sure if this is a valid use-case, but I really wanted to run
  it by you guys. How do I load balance my consumers? For example, if my
  consumer machine is under load, I'd like to spin up another VM with
 another
  consumer process to keep reading messages off any topic. On similar
 lines,
  how do you guys handle consumer failures? Suppose one consumer process
 gets
  an exception and crashes, is it possible for me to somehow make sure that
  there is another process that is still reading the queue for me?
 
  --
  Sharninder



Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-08 Thread Gwen Shapira
Here's an example (from ConsumerOffsetChecker tool) of 1 topic (t1)
and 1 consumer group (flume), each of the 3 topic partitions is being
read by a different machine running the flume consumer:
Group   Topic  Pid Offset
logSize Lag Owner
flume   t1 0   50172068
100210042   50037974
flume_kafkacdh-1.ent.cloudera.com-1412722833783-3d6d80db-0
flume   t1 1   49914701
499147010
flume_kafkacdh-2.ent.cloudera.com-1412722838536-a6a4915d-0
flume   t1 2   54218841
8273338028514539
flume_kafkacdh-3.ent.cloudera.com-1412722832793-b23eaa63-0

If flume_kafkacdh-1 crashed, another broker will pick up the partition:
Group   Topic  Pid Offset
logSize Lag Owner
flume   t1 0   59669715
100210042   40540327
flume_kafkacdh-2.ent.cloudera.com-1412792880818-b4aa6feb-0
flume   t1 1   49914701
499147010
flume_kafkacdh-2.ent.cloudera.com-1412792880818-b4aa6feb-0
flume   t1 2   65796205
8273338016937175
flume_kafkacdh-3.ent.cloudera.com-1412792871089-cabd4934-0

Then I can start flume_kafkacdh-4 and see things rebalance again:
flume   t1 0   60669715
100210042   39540327
flume_kafkacdh-2.ent.cloudera.com-1412792880818-b4aa6feb-0
flume   t1 1   49914701
499147010
flume_kafkacdh-3.ent.cloudera.com-1412792871089-cabd4934-0
flume   t1 2   66829740
8273338015903640
flume_kafkacdh-4.ent.cloudera.com-1412793053882-9bfddff9-0

Isn't Kafka the best thing ever? :)

Gwen

On Wed, Oct 8, 2014 at 11:23 AM, Gwen Shapira gshap...@cloudera.com wrote:
 yep. exactly.

 On Wed, Oct 8, 2014 at 11:07 AM, Sharninder sharnin...@gmail.com wrote:
 Thanks Gwen.

 When you're saying that I can add consumers to the same group, does that
 also hold true if those consumers are running on different machines? Or in
 different JVMs?

 --
 Sharninder


 On Wed, Oct 8, 2014 at 11:35 PM, Gwen Shapira gshap...@cloudera.com wrote:

 If you use the high level consumer implementation, and register all
 consumers as part of the same group - they will load-balance
 automatically.

 When you add a consumer to the group, if there are enough partitions
 in the topic, some of the partitions will be assigned to the new
 consumer.
 When a consumer crashes, once its node in ZK times out, other
 consumers will get its partitions.

 Gwen

 On Wed, Oct 8, 2014 at 10:39 AM, Sharninder sharnin...@gmail.com wrote:
  Hi,
 
  I'm not even sure if this is a valid use-case, but I really wanted to run
  it by you guys. How do I load balance my consumers? For example, if my
  consumer machine is under load, I'd like to spin up another VM with
 another
  consumer process to keep reading messages off any topic. On similar
 lines,
  how do you guys handle consumer failures? Suppose one consumer process
 gets
  an exception and crashes, is it possible for me to somehow make sure that
  there is another process that is still reading the queue for me?
 
  --
  Sharninder



Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-08 Thread Sharninder
Thanks Gwen. This really helped.

Yes, Kafka is the best thing ever :)

Now how would this be done with the Simple consumer? I'm guessing I'll have
to maintain my own state in Zookeeper or something of that sort?


On Thu, Oct 9, 2014 at 12:01 AM, Gwen Shapira gshap...@cloudera.com wrote:

 Here's an example (from ConsumerOffsetChecker tool) of 1 topic (t1)
 and 1 consumer group (flume), each of the 3 topic partitions is being
 read by a different machine running the flume consumer:
 Group   Topic  Pid Offset
 logSize Lag Owner
 flume   t1 0   50172068
 100210042   50037974
 flume_kafkacdh-1.ent.cloudera.com-1412722833783-3d6d80db-0
 flume   t1 1   49914701
 499147010
 flume_kafkacdh-2.ent.cloudera.com-1412722838536-a6a4915d-0
 flume   t1 2   54218841
 8273338028514539
 flume_kafkacdh-3.ent.cloudera.com-1412722832793-b23eaa63-0

 If flume_kafkacdh-1 crashed, another broker will pick up the partition:
 Group   Topic  Pid Offset
 logSize Lag Owner
 flume   t1 0   59669715
 100210042   40540327
 flume_kafkacdh-2.ent.cloudera.com-1412792880818-b4aa6feb-0
 flume   t1 1   49914701
 499147010
 flume_kafkacdh-2.ent.cloudera.com-1412792880818-b4aa6feb-0
 flume   t1 2   65796205
 8273338016937175
 flume_kafkacdh-3.ent.cloudera.com-1412792871089-cabd4934-0

 Then I can start flume_kafkacdh-4 and see things rebalance again:
 flume   t1 0   60669715
 100210042   39540327
 flume_kafkacdh-2.ent.cloudera.com-1412792880818-b4aa6feb-0
 flume   t1 1   49914701
 499147010
 flume_kafkacdh-3.ent.cloudera.com-1412792871089-cabd4934-0
 flume   t1 2   66829740
 8273338015903640
 flume_kafkacdh-4.ent.cloudera.com-1412793053882-9bfddff9-0

 Isn't Kafka the best thing ever? :)

 Gwen

 On Wed, Oct 8, 2014 at 11:23 AM, Gwen Shapira gshap...@cloudera.com
 wrote:
  yep. exactly.
 
  On Wed, Oct 8, 2014 at 11:07 AM, Sharninder sharnin...@gmail.com
 wrote:
  Thanks Gwen.
 
  When you're saying that I can add consumers to the same group, does that
  also hold true if those consumers are running on different machines? Or
 in
  different JVMs?
 
  --
  Sharninder
 
 
  On Wed, Oct 8, 2014 at 11:35 PM, Gwen Shapira gshap...@cloudera.com
 wrote:
 
  If you use the high level consumer implementation, and register all
  consumers as part of the same group - they will load-balance
  automatically.
 
  When you add a consumer to the group, if there are enough partitions
  in the topic, some of the partitions will be assigned to the new
  consumer.
  When a consumer crashes, once its node in ZK times out, other
  consumers will get its partitions.
 
  Gwen
 
  On Wed, Oct 8, 2014 at 10:39 AM, Sharninder sharnin...@gmail.com
 wrote:
   Hi,
  
   I'm not even sure if this is a valid use-case, but I really wanted
 to run
   it by you guys. How do I load balance my consumers? For example, if
 my
   consumer machine is under load, I'd like to spin up another VM with
  another
   consumer process to keep reading messages off any topic. On similar
  lines,
   how do you guys handle consumer failures? Suppose one consumer
 process
  gets
   an exception and crashes, is it possible for me to somehow make sure
 that
   there is another process that is still reading the queue for me?
  
   --
   Sharninder