[ 
https://issues.apache.org/jira/browse/KAFKA-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15696340#comment-15696340
 ] 

Json Tu commented on KAFKA-4447:
--------------------------------

after check the email's response in the dev's mail list,I review the kafka's 
code again, I guess the reason may be as below.
1.as [~guozhang]'s saying, "unsubscribeChildChanges" on ZkClient and listener 
fired procedure are executed on different threads.
2.the zkclient's event thread which processing callbacks from zk server is 
single thread. and it may be many callbacks after controller's 
SessionExpirationListener's callback, such as 
ReassignedPartitionsIsrChangeListener, IsrChangeNotificationListener and so on.
3.so after we execute SessionExpirationListener's callback, though it 
deregister all listener at the end. but we also need to run other callback's 
after this controller ressign.
4.so the controller's log of the attachment shows that it also acts as a 
controller, and it continued about 3 minutes.
5.I think the reason that leads to so long time is that my kafka cluster's 
enviroment's is not so stable,and it leads some brokers expired from the 
zkserver,which trigger some callback that listened by controller.



> Controller resigned but it also acts as a controller for a long time 
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-4447
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4447
>             Project: Kafka
>          Issue Type: Improvement
>          Components: controller
>    Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>         Environment: Linux Os
>            Reporter: Json Tu
>         Attachments: log.tar.gz
>
>
> We have a cluster with 10 nodes,and we execute following operation as below.
> 1.we execute some topic partition reassign from one node to other 9 nodes in 
> the cluster, and which triggered controller.
> 2.controller invoke PartitionsReassignedListener's handleDataChange and read 
> all partition reassign rules from the zk path, and executed all 
> onPartitionReassignment for all partition that match conditions.
> 3.but the controller is expired from zk, after what some nodes of 9 nodes 
> also expired from zk.
> 5.then controller invoke onControllerResignation to resigned as the 
> controller.
> we found after the controller is resigned, it acts as controller for about 3 
> minutes, which can be found in my attachment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to