[ 
https://issues.apache.org/jira/browse/KAFKA-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Json Tu updated KAFKA-4360:
---------------------------
    Description: 
when controller has checkAndTriggerPartitionRebalance task in 
autoRebalanceScheduler,and then zk expired at that time. It will
run into deadlock.

we can restore the scene as below,when zk session expired,zk thread will call 
handleNewSession which defined in SessionExpirationListener, and it will get 
controllerContext.controllerLock,and then it will 
autoRebalanceScheduler.shutdown(),which need complete all the task in the 
autoRebalanceScheduler,but that threadPoll also need get 
controllerContext.controllerLock,but it has already owned by zk callback 
thread,which will then run into deadlock.

because of that,it will cause two problems at least, first is the broker’s id 
is cannot register to the zookeeper,and it will be considered as dead by new 
controller,second this procedure can not be stop by kafka-server-stop.sh, 
because shutdown function
can not get controllerContext.controllerLock also, we cannot shutdown kafka 
except using kill -9.

I running a jstack on my kafka procedure when I using kafka-server-stop.sh to 
close kafka but not success, which is put in my attachment.

I have met this scenes for several times,I think this may be a bug that not 
solved in kafka,can I give a pull request to kafka?

  was:
when controller has checkAndTriggerPartitionRebalance task in 
autoRebalanceScheduler,and then zk expired at that time. It will
run into deadlock.

we can restore the scene as below,when zk session expired,zk thread will call 
handleNewSession which defined in SessionExpirationListener, and it will get 
controllerContext.controllerLock,and then it will 
autoRebalanceScheduler.shutdown(),which need complete all the task in the 
autoRebalanceScheduler,but that threadPoll also need get 
controllerContext.controllerLock,but it has already owned by zk callback 
thread,which will then run into deadlock.

because of that,it will cause two problems at least, first is the broker’s id 
is cannot register to the zookeeper,and it will be considered as dead by new 
controller,second this procedure can not be stop by kafka-server-stop.sh, 
because shutdown function
can not get controllerContext.controllerLock also, we cannot shutdown kafka 
except using kill -9.

I running a jstack on my kafka procedure when I using kafka-server-stop.sh to 
close kafka but not success, which is put in my attachment.

I have met this scenes for several times,I think this may be a bug in kafka,can 
I give a pull request to kafka?


> Controller may deadLock when autoLeaderRebalance encounter zk expired
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-4360
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4360
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>            Reporter: Json Tu
>              Labels: bugfix
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> when controller has checkAndTriggerPartitionRebalance task in 
> autoRebalanceScheduler,and then zk expired at that time. It will
> run into deadlock.
> we can restore the scene as below,when zk session expired,zk thread will call 
> handleNewSession which defined in SessionExpirationListener, and it will get 
> controllerContext.controllerLock,and then it will 
> autoRebalanceScheduler.shutdown(),which need complete all the task in the 
> autoRebalanceScheduler,but that threadPoll also need get 
> controllerContext.controllerLock,but it has already owned by zk callback 
> thread,which will then run into deadlock.
> because of that,it will cause two problems at least, first is the broker’s id 
> is cannot register to the zookeeper,and it will be considered as dead by new 
> controller,second this procedure can not be stop by kafka-server-stop.sh, 
> because shutdown function
> can not get controllerContext.controllerLock also, we cannot shutdown kafka 
> except using kill -9.
> I running a jstack on my kafka procedure when I using kafka-server-stop.sh to 
> close kafka but not success, which is put in my attachment.
> I have met this scenes for several times,I think this may be a bug that not 
> solved in kafka,can I give a pull request to kafka?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to