[
https://issues.apache.org/jira/browse/KAFKA-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jiangjie Qin updated KAFKA-2980:
--------------------------------
Resolution: Cannot Reproduce
Status: Resolved (was: Patch Available)
> ZookeeperConsumerConnector may enter deadlock if a rebalance occurs during a
> stream creation.
> ---------------------------------------------------------------------------------------------
>
> Key: KAFKA-2980
> URL: https://issues.apache.org/jira/browse/KAFKA-2980
> Project: Kafka
> Issue Type: Bug
> Reporter: Jiangjie Qin
> Assignee: Jiangjie Qin
>
> The following sequence caused problems:
> 1. Multiple ZookeeperConsumerConnector in the same group start at the same
> time.
> 2. The user consumer thread called createMessageStreamsByFilter()
> 3. Right before the user consumer thread enters syncedRebalance(), a
> rebalance was triggered by another consumer joining the group.
> 4. Because the watcher executor has been up and running at this point, the
> executor watcher will start to rebalance. Now both the user consumer thread
> and the executor watcher are trying to rebalance.
> 5. The executor watcher wins this time. It finishes the rebalance, so the
> fetchers started to run.
> 6. After that the user consumer thread will try to rebalance again, but it
> blocks when trying to stop the fetchers. Since the fetcher threads are
> blocked on putting data chunk into data chunk queue.
> 7. In this case, because there is no thread taking messages out of data chunk
> queue, the fetcher thread will not be able to make process. Neither does the
> user consumer thread. So we have a deadlock here.
> The current code works if there is no fetcher thread running when
> createMessageStreams/createMessageStreamsByFilter is called. The simple fix
> is to let those two methods acquire the rebalance lock.
> Although it is a fix to old consumer, but since the fix is quite small and
> important for people who are still using old consumer. I think it still worth
> doing.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)