Jiangjie Qin created KAFKA-2980:
-----------------------------------
Summary: ZookeeperConsumerConnector may enter deadlock if a
rebalance occurs during a stream creation.
Key: KAFKA-2980
URL: https://issues.apache.org/jira/browse/KAFKA-2980
Project: Kafka
Issue Type: Bug
Reporter: Jiangjie Qin
Assignee: Jiangjie Qin
The following sequence caused problems:
1. Multiple ZookeeperConsumerConnector in the same group start at the same time.
2. The user consumer thread called createMessageStreamsByFilter()
3. Right before the user consumer thread enters syncedRebalance(), a rebalance
was triggered by another consumer joining the group.
4. Because the watcher executor has been up and running at this point, the
executor watcher will start to rebalance. Now both the user consumer thread and
the executor watcher are trying to rebalance.
5. The executor watcher wins this time. It finishes the rebalance, so the
fetchers started to run.
6. After that the user consumer thread will try to rebalance again, but it
blocks when trying to stop the fetchers. Since the fetcher threads are blocked
on putting data chunk into data chunk queue.
7. In this case, because there is no thread taking messages out of data chunk
queue, the fetcher thread will not be able to make process. Neither does the
user consumer thread. So we have a deadlock here.
The current code works if there is no fetcher thread running when
createMessageStreams/createMessageStreamsByFilter is called. The simple fix is
to let those two methods acquire the rebalance lock.
Although it is a fix to old consumer, but since the fix is quite small and
important for people who are still using old consumer. I think it still worth
doing.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)