[
https://issues.apache.org/jira/browse/KAFKA-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051512#comment-15051512
]
ASF GitHub Bot commented on KAFKA-2980:
---------------------------------------
GitHub user becketqin reopened a pull request:
https://github.com/apache/kafka/pull/660
KAFKA-2980 Fix deadlock when ZookeeperConsumerConnector create messag…
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/becketqin/kafka KAFKA-2980
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/kafka/pull/660.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #660
----
commit 6ad40206f354512b1f2db1e3784754ea29415ce7
Author: Jiangjie Qin <[email protected]>
Date: 2015-12-10T19:08:15Z
KAKFA-2980 Fix deadlock when ZookeeperConsumerConnector create message
streams.
----
> ZookeeperConsumerConnector may enter deadlock if a rebalance occurs during a
> stream creation.
> ---------------------------------------------------------------------------------------------
>
> Key: KAFKA-2980
> URL: https://issues.apache.org/jira/browse/KAFKA-2980
> Project: Kafka
> Issue Type: Bug
> Reporter: Jiangjie Qin
> Assignee: Jiangjie Qin
>
> The following sequence caused problems:
> 1. Multiple ZookeeperConsumerConnector in the same group start at the same
> time.
> 2. The user consumer thread called createMessageStreamsByFilter()
> 3. Right before the user consumer thread enters syncedRebalance(), a
> rebalance was triggered by another consumer joining the group.
> 4. Because the watcher executor has been up and running at this point, the
> executor watcher will start to rebalance. Now both the user consumer thread
> and the executor watcher are trying to rebalance.
> 5. The executor watcher wins this time. It finishes the rebalance, so the
> fetchers started to run.
> 6. After that the user consumer thread will try to rebalance again, but it
> blocks when trying to stop the fetchers. Since the fetcher threads are
> blocked on putting data chunk into data chunk queue.
> 7. In this case, because there is no thread taking messages out of data chunk
> queue, the fetcher thread will not be able to make process. Neither does the
> user consumer thread. So we have a deadlock here.
> The current code works if there is no fetcher thread running when
> createMessageStreams/createMessageStreamsByFilter is called. The simple fix
> is to let those two methods acquire the rebalance lock.
> Although it is a fix to old consumer, but since the fix is quite small and
> important for people who are still using old consumer. I think it still worth
> doing.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)