Jun Rao created KAFKA-3215:
------------------------------

             Summary: controller may not be started when there are multiple ZK 
session expirations
                 Key: KAFKA-3215
                 URL: https://issues.apache.org/jira/browse/KAFKA-3215
             Project: Kafka
          Issue Type: Bug
          Components: core
            Reporter: Jun Rao


Suppose that broker 1 is the controller and it has 2 consecutive ZK session 
expirations. In this case, two ZK session expiration events will be fired.

1. When handling the first ZK session expiration event, 
SessionExpirationListener.handleNewSession() can elect broker 1 itself as the 
new controller and initialize the states properly.

2. When handling the second ZK session expiration event, 
SessionExpirationListener.handleNewSession() first calls 
onControllerResignation(), which will set ReplicaStateMachine.hasStarted to 
false. It then continues to do controller election in 
ZookeeperLeaderElector.elect() and try to create the controller node in ZK. 
This will fail since broker 1 has already registered itself as the controller 
node in ZK. In this case, we simply ignore the failure to create the controller 
node since we assume the controller must be in another broker. However, in this 
case, the controller is broker 1 itself, but the ReplicaStateMachine.hasStarted 
is still false.
3. Now, if a new broker event is fired, we will be ignoring the event in 
BrokerChangeListener.handleChildChange since ReplicaStateMachine.hasStarted is 
false. Now, we are in a situation that a controller is alive, but won't react 
to any broker change event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to