Jiangjie Qin created KAFKA-2980:
-----------------------------------

             Summary: ZookeeperConsumerConnector may enter deadlock if a 
rebalance occurs during a stream creation.
                 Key: KAFKA-2980
                 URL: https://issues.apache.org/jira/browse/KAFKA-2980
             Project: Kafka
          Issue Type: Bug
            Reporter: Jiangjie Qin
            Assignee: Jiangjie Qin


The following sequence caused problems:

1. Multiple ZookeeperConsumerConnector in the same group start at the same time.
2. The user consumer thread called createMessageStreamsByFilter()
3. Right before the user consumer thread enters syncedRebalance(), a rebalance 
was triggered by another consumer joining the group.
4. Because the watcher executor has been up and running at this point, the 
executor watcher will start to rebalance. Now both the user consumer thread and 
the executor watcher are trying to rebalance.
5. The executor watcher wins this time. It finishes the rebalance, so the 
fetchers started to run.
6. After that the user consumer thread will try to rebalance again, but it 
blocks when trying to stop the fetchers. Since the fetcher threads are blocked 
on putting data chunk into data chunk queue.
7. In this case, because there is no thread taking messages out of data chunk 
queue, the fetcher thread will not be able to make process. Neither does the 
user consumer thread. So we have a deadlock here.

The current code works if there is no fetcher thread running when 
createMessageStreams/createMessageStreamsByFilter is called. The simple fix is 
to let those two methods acquire the rebalance lock.

Although it is a fix to old consumer, but since the fix is quite small and 
important for people who are still using old consumer. I think it still worth 
doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to