Joseph Aliase created KAFKA-3828: ------------------------------------ Summary: Consumer thread stalls after consumer re balance for some partition Key: KAFKA-3828 URL: https://issues.apache.org/jira/browse/KAFKA-3828 Project: Kafka Issue Type: Bug Components: consumer Environment: Operating System : CentOS release 6.4 Kafka Cluster: Stand alone cluster with one broker and one zookeeper. Reporter: Joseph Aliase Assignee: Neha Narkhede
In process of testing the new Kafka Consumer API we came across this issue. We started single broker Kafka Cluster with broker listening on port 9092 and zookeeper on 2181. We created a topic test with partition 6. We started a consumer with below configuration: bootstrap.servers= host-name:9092 group.id=consumer-group key.deserializer=StringDeserializer.class.getName() value.deserializer=StringDeserializer.class.getName() session.timeout.ms=30000 heartbeat.interval.ms=10000 We started producing data into topic test: sh kafka-producer-perf-test.sh --topic test --num-records 1000000 --record-size 10 --throughput 500 --producer-props bootstrap.servers=localhost:9092 Consumer instance is started with 6 threads to consume data from 6 partition. We then restart another consumer instance with 6 threads. Consumer re-balance occurs and 6 partitions is divided equally among this two instance. Then we start another consumer instance with 6 threads again we could see re-balance occurring with partition getting divided among three consumer instance. Everything works well. Then if we stop one consumer instance and partitions get re-balanced between two instance. If we stop and restart the another running instances and repeat the steps for few time we could see the issue occurring where we could see Consumer is holding the partition's but not consuming any data from that partition. Partition data remain unconsumed until we stop the consumer instance which is holding the partition. We were not able to reproduce this issue we publish data to topic at very low rate however issue could be easily reproduced when data is being published at high rate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)