[ https://issues.apache.org/jira/browse/KAFKA-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624461#comment-15624461 ]
Json Tu edited comment on KAFKA-4360 at 11/1/16 5:49 AM: --------------------------------------------------------- it is wonderful,I search onControllerResignation() in kafka codes. just as you say there are two other invokes in ZookeeperLeaderElector,can you assign this task to me,I very pleased to put a pull request for it,thank you was (Author: json tu): it is wonderful,I search onControllerResignation() in kafka codes. just as you say there are two other invokes in ZookeeperLeaderElector,can you assign this task to me,I very pleased to put a pull request for it,thank you > Controller may deadLock when autoLeaderRebalance encounter zk expired > --------------------------------------------------------------------- > > Key: KAFKA-4360 > URL: https://issues.apache.org/jira/browse/KAFKA-4360 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1 > Reporter: Json Tu > Labels: bugfix > Attachments: deadlock_patch, yf-mafka2-common02_jstack.txt > > Original Estimate: 168h > Remaining Estimate: 168h > > when controller has checkAndTriggerPartitionRebalance task in > autoRebalanceScheduler,and then zk expired at that time. It will > run into deadlock. > we can restore the scene as below,when zk session expired,zk thread will call > handleNewSession which defined in SessionExpirationListener, and it will get > controllerContext.controllerLock,and then it will > autoRebalanceScheduler.shutdown(),which need complete all the task in the > autoRebalanceScheduler,but that threadPoll also need get > controllerContext.controllerLock,but it has already owned by zk callback > thread,which will then run into deadlock. > because of that,it will cause two problems at least, first is the broker’s id > is cannot register to the zookeeper,and it will be considered as dead by new > controller,second this procedure can not be stop by kafka-server-stop.sh, > because shutdown function > can not get controllerContext.controllerLock also, we cannot shutdown kafka > except using kill -9. > In my attachment, I upload a jstack file, which was created when my kafka > procedure cannot shutdown by kafka-server-stop.sh. > I have met this scenes for several times,I think this may be a bug that not > solved in kafka. -- This message was sent by Atlassian JIRA (v6.3.4#6332)