Stefan Miklosovic created KAFKA-2331:
----------------------------------------

             Summary: Kafka does not spread partitions in a topic among all 
consumers evenly
                 Key: KAFKA-2331
                 URL: https://issues.apache.org/jira/browse/KAFKA-2331
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 0.8.1.1
            Reporter: Stefan Miklosovic


I want to have 1 topic with 10 partitions. I am using default configuration of 
Kafka. I create 1 topic with 10 partitions by that helper script and now I am 
about to produce messages to it.

The thing is that even all partitions are indeed consumed, some consumers have 
more then 1 partition assigned even I have number of consumer threads equal to 
partitions in a topic hence some threads are idle.

Let's describe it in more detail.

I know that common stuff that you need one consumer thread per partition. I 
want to be able to commit offsets per partition and this is possible only when 
I have 1 thread per consumer connector per partition (I am using high level 
consumer).

So I create 10 threads, in each thread I am calling 
Consumer.createJavaConsumerConnector() where I am doing this

topicCountMap.put("mytopic", 1);
and in the end I have 1 iterator which consumes messages from 1 partition.

When I do this 10 times, I have 10 consumers, consumer per thread per partition 
where I can commit offsets independently per partition because if I put 
different number from 1 in topic map, I would end up with more then 1 consumer 
thread for that topic for given consumer instance so if I am about to commit 
offsets with created consumer instance, it would commit them for all threads 
which is not desired.

But the thing is that when I use consumers, only 7 consumers are involved and 
it seems that other consumer threads are idle but I do not know why.

The thing is that I am creating these consumer threads in a loop. So I start 
first thread (submit to executor service), then another, then another and so on.

So the scenario is that first consumer gets all 10 partitions, then 2nd 
connects so it is splits between these two to 5 and 5 (or something similar), 
then other threads are connecting.

I understand this as a partition rebalancing among all consumers so it behaves 
well in such sense that if more consumers are being created, partition 
rebalancing occurs between these consumers so every consumer should have some 
partitions to operate upon.

But from the results I see that there is only 7 consumers and according to 
consumed messages it seems they are split like 3,2,1,1,1,1,1 partition-wise. 
Yes, these 7 consumers covered all 10 partitions, but why consumers with more 
then 1 partition do no split and give partitions to remaining 3 consumers?

I am pretty much wondering what is happening with remaining 3 threads and why 
they do not "grab" partitions from consumers which have more then 1 partition 
assigned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to