Re: Scaling up kafka consumers

2017-02-24 Thread Gerrit Jansen van Vuuren
The kafka fast connector handles this differently than the standard kafka
client (which requires one consumer per partition at most), by breaking
offsets into consumable ranges which allows one partition to be read by
multiple conumers where each consumer uniquely receives a different offset
range.

See:
https://github.com/gerritjvv/kafka-fast

That said:
There are two usage scenarios:
Lots of topics (10 or more) where you'd want 1-2-4 partitions or less
topics where you want more partitions, if yours is the latter you'd want
more than a single partition 4-6 are better numbers ( numbers are from my
experience with 5 nodes).



On 24 Feb 2017 17:30, "Jakub Stransky"  wrote:

> Hello everyone,
>
> I was reading/checking kafka documentation regarding point-2-point and
> publish subscribe communications patterns in kafka and I am wondering how
> to scale up consumer side in point to point scenario when consuming from
> single kafka topic.
>
> Let say I have a single topic with single partition and I have one node
> where the kafka consumer is running. If I want to scale up my service I add
> another node - which has the same configuration as the first one (topic,
> partition and consumer group id). Those two nodes start competing for
> messages from kafka topic.
>
> What I am not sure in this scenario and is actually subject of my question
> is "*Whether they do get each node unique messages or there is still
> possibility that some messages will be consumed by both nodes etc*".
> Because I can see scenarios that both nodes are started at the same time -
> they gets the same topic offset from zookeeper and started consuming
> messages from that offset. OR am I thinking in a wrong direction?
>
> Thanks
> Jakub
>


Re: Scaling up kafka consumers

2017-02-24 Thread Ian Wrigley
Hi

If you have two consumers in your consumer group, but only one partition in the 
topic, then only one consumer will do any work. It’s not the case that “those 
two nodes start competing for messages” — one node will read from the 
partition, the other will have nothing to do. So to scale up by adding more 
consumers in your consumer group, you’ll need more partitions in your topic.

Ian.

> On Feb 24, 2017, at 9:30 AM, Jakub Stransky  wrote:
> 
> Hello everyone,
> 
> I was reading/checking kafka documentation regarding point-2-point and
> publish subscribe communications patterns in kafka and I am wondering how
> to scale up consumer side in point to point scenario when consuming from
> single kafka topic.
> 
> Let say I have a single topic with single partition and I have one node
> where the kafka consumer is running. If I want to scale up my service I add
> another node - which has the same configuration as the first one (topic,
> partition and consumer group id). Those two nodes start competing for
> messages from kafka topic.
> 
> What I am not sure in this scenario and is actually subject of my question
> is "*Whether they do get each node unique messages or there is still
> possibility that some messages will be consumed by both nodes etc*".
> Because I can see scenarios that both nodes are started at the same time -
> they gets the same topic offset from zookeeper and started consuming
> messages from that offset. OR am I thinking in a wrong direction?
> 
> Thanks
> Jakub



Re: Scaling up kafka consumers

2017-02-24 Thread Pradeep Gollakota
A single partition can be consumed by at most a single consumer. Consumers
compete to take ownership of a partition. So, in order to gain parallelism
you need to add more partitions.

There is a library that allows multiple consumers to consume from a single
partition https://github.com/gerritjvv/kafka-fast. But I've never used it.

On Fri, Feb 24, 2017 at 7:30 AM, Jakub Stransky 
wrote:

> Hello everyone,
>
> I was reading/checking kafka documentation regarding point-2-point and
> publish subscribe communications patterns in kafka and I am wondering how
> to scale up consumer side in point to point scenario when consuming from
> single kafka topic.
>
> Let say I have a single topic with single partition and I have one node
> where the kafka consumer is running. If I want to scale up my service I add
> another node - which has the same configuration as the first one (topic,
> partition and consumer group id). Those two nodes start competing for
> messages from kafka topic.
>
> What I am not sure in this scenario and is actually subject of my question
> is "*Whether they do get each node unique messages or there is still
> possibility that some messages will be consumed by both nodes etc*".
> Because I can see scenarios that both nodes are started at the same time -
> they gets the same topic offset from zookeeper and started consuming
> messages from that offset. OR am I thinking in a wrong direction?
>
> Thanks
> Jakub
>