[
https://issues.apache.org/jira/browse/KAFKA-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guozhang Wang resolved KAFKA-4360.
----------------------------------
Resolution: Fixed
Fix Version/s: 0.10.2.0
Issue resolved by pull request 2094
[https://github.com/apache/kafka/pull/2094]
> Controller may deadLock when autoLeaderRebalance encounter zk expired
> ---------------------------------------------------------------------
>
> Key: KAFKA-4360
> URL: https://issues.apache.org/jira/browse/KAFKA-4360
> Project: Kafka
> Issue Type: Bug
> Components: controller
> Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
> Reporter: Json Tu
> Labels: bugfix
> Fix For: 0.10.2.0
>
> Attachments: deadlock_patch, yf-mafka2-common02_jstack.txt
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> when controller has checkAndTriggerPartitionRebalance task in
> autoRebalanceScheduler,and then zk expired at that time. It will
> run into deadlock.
> we can restore the scene as below,when zk session expired,zk thread will call
> handleNewSession which defined in SessionExpirationListener, and it will get
> controllerContext.controllerLock,and then it will
> autoRebalanceScheduler.shutdown(),which need complete all the task in the
> autoRebalanceScheduler,but that threadPoll also need get
> controllerContext.controllerLock,but it has already owned by zk callback
> thread,which will then run into deadlock.
> because of that,it will cause two problems at least, first is the broker’s id
> is cannot register to the zookeeper,and it will be considered as dead by new
> controller,second this procedure can not be stop by kafka-server-stop.sh,
> because shutdown function
> can not get controllerContext.controllerLock also, we cannot shutdown kafka
> except using kill -9.
> In my attachment, I upload a jstack file, which was created when my kafka
> procedure cannot shutdown by kafka-server-stop.sh.
> I have met this scenes for several times,I think this may be a bug that not
> solved in kafka.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)