Joseph Aliase created KAFKA-3828:
------------------------------------

             Summary: Consumer thread stalls after consumer re balance for some 
partition 
                 Key: KAFKA-3828
                 URL: https://issues.apache.org/jira/browse/KAFKA-3828
             Project: Kafka
          Issue Type: Bug
          Components: consumer
         Environment: Operating System : CentOS release 6.4
Kafka Cluster: Stand alone cluster with one broker and one zookeeper.
            Reporter: Joseph Aliase
            Assignee: Neha Narkhede


In process of testing the new Kafka Consumer API we came across this issue. We 
started single broker Kafka Cluster with broker listening on port 9092 and 
zookeeper on 2181.

We created a topic test with partition 6. We started a consumer with below 
configuration:

bootstrap.servers= host-name:9092
group.id=consumer-group
key.deserializer=StringDeserializer.class.getName()
value.deserializer=StringDeserializer.class.getName()
session.timeout.ms=30000
heartbeat.interval.ms=10000

We started producing data into topic test:
sh kafka-producer-perf-test.sh --topic test --num-records 1000000 --record-size 
10 --throughput 500 --producer-props bootstrap.servers=localhost:9092

Consumer instance is started with 6 threads to consume data from 6 partition. 

We then restart another consumer instance with 6 threads. Consumer re-balance 
occurs and 6 partitions is divided equally among this two instance.

Then we start another consumer instance with 6 threads again we could see 
re-balance occurring with partition getting divided among three consumer 
instance. Everything works well.

Then if we stop one consumer instance and partitions get re-balanced between 
two instance. 

If we stop and restart the another running instances and repeat the steps for 
few time we could see the issue occurring where we could see Consumer is 
holding the partition's but not consuming any data from that partition. 
Partition data remain unconsumed until we stop the consumer instance which is 
holding the partition. 

We were not able to reproduce this issue we publish data to topic at very low 
rate however issue could be easily reproduced when data is being published at 
high rate.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to