[ https://issues.apache.org/jira/browse/KAFKA-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726975#comment-15726975 ]
ASF GitHub Bot commented on KAFKA-4442: --------------------------------------- Github user lindong28 closed the pull request at: https://github.com/apache/kafka/pull/2167 > Controller should grab lock when it is being initialized to avoid race > condition > -------------------------------------------------------------------------------- > > Key: KAFKA-4442 > URL: https://issues.apache.org/jira/browse/KAFKA-4442 > Project: Kafka > Issue Type: Bug > Reporter: Dong Lin > Assignee: Dong Lin > > Currently controller will register broker change listener before sending send > LeaderAndIsrRequests to live replicas. The call path looks like this: > - onControllerFailover() > - partitionStateMachine.startup() > - triggerOnlinePartitionStateChange() > - handleStateChange(partition, OnlinePartition) > - electLeaderForPartition(partition) > - determines live replicas for this partition (step a) > - add partition to controllerContext.partitionLeadershipInfo. (step > b) > - send LeaderAndIsrRequest to those live replics for this partition > However, if a broker registers itself in zookeeper in between step (a) and > step (b), the onBrokerStartup() will not send LeaderAndIsrRequest to this > broker for this partition because the partition is not found in > controllerContext.partitionLeadershipInfo. Yet onControllerFailover() will > not send LeaderAndIsrRequest to this broker for this partition either because > the broker is not considered live in step (a). > The root cause is that onBrokerStartup() should only be executed after > controller has finished onControllerFailover() and initialized its state. > Therefore controller should grab the lock controllerContext.controllerLock > during onControllerFailover(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)