Jun Rao created KAFKA-3215:
------------------------------
Summary: controller may not be started when there are multiple ZK
session expirations
Key: KAFKA-3215
URL: https://issues.apache.org/jira/browse/KAFKA-3215
Project: Kafka
Issue Type: Bug
Components: core
Reporter: Jun Rao
Suppose that broker 1 is the controller and it has 2 consecutive ZK session
expirations. In this case, two ZK session expiration events will be fired.
1. When handling the first ZK session expiration event,
SessionExpirationListener.handleNewSession() can elect broker 1 itself as the
new controller and initialize the states properly.
2. When handling the second ZK session expiration event,
SessionExpirationListener.handleNewSession() first calls
onControllerResignation(), which will set ReplicaStateMachine.hasStarted to
false. It then continues to do controller election in
ZookeeperLeaderElector.elect() and try to create the controller node in ZK.
This will fail since broker 1 has already registered itself as the controller
node in ZK. In this case, we simply ignore the failure to create the controller
node since we assume the controller must be in another broker. However, in this
case, the controller is broker 1 itself, but the ReplicaStateMachine.hasStarted
is still false.
3. Now, if a new broker event is fired, we will be ignoring the event in
BrokerChangeListener.handleChildChange since ReplicaStateMachine.hasStarted is
false. Now, we are in a situation that a controller is alive, but won't react
to any broker change event.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)