[ 
https://issues.apache.org/jira/browse/KAFKA-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467184#comment-13467184
 ] 

Jun Rao commented on KAFKA-532:
-------------------------------

Thanks for patch v1. Some comments:
1. PartitionStateMachine.electLeaderForPartition(): We throw an exception if 
controllerEpoch != controller.epoch. It seems that we should only throw an 
exception if controllerEpoch > controller.epoch. The controllerEpoch in the 
leaderAndIsr path in ZK indicates the epoch of the controller when leader was 
changed last time. It's ok for a leader not to be changed during multiple 
generations of the controller. What we need to prevent is that if a newer 
generation of a controller has already changed the leader, the older generation 
of the controller can't touch the leaderAndIsr path any more. Also, it's not 
clear to me if we need the var retry. Ditto in 
ReplicaStateMachine.handleStateChange().

2. ZkUtils.getLeaderAndIsrForPartition(): Instead of returning a tuple, could 
we return a LeaderAndIsrInfo object?

3. ZookeeperLeaderElector: We keep the controller epoch in the value of the 
controller path. The problem is that the controller path is ephemeral. During a 
controller failover, the controller path will be gone and we won't be able to 
obtain the last controller epoch. We will need to store the controller epoch in 
a separate persistence path. That will probably complicate things a bit more.

4. LeaderAndIsRRequest,StopReplicaRequest sizeInBytes(): When computing size, 
could we add a comment for each number which field is it for?

5. Partition.updateISR(): If we want to keep the semantics that the 
controllerEpoch in the leaderIsrPath is the epoch of the controller when the 
leader is changed, we need to use the controller epoch in the current 
leaderIsrPath when updating ISR, not the latest controller epoch this broker 
has seen.

6. ControllerChannelManager.sendRequestToBrokers: For similar reasons as the 
above, the controller epoch in the LeaderAndIsrRequest may not always be the 
epoch of the current controller (e.g., when resending the leadership info 
stored in ZK during controller failover).

7. ReplicaManager: We will nee to maintain a controller epoc per partition. Do 
we still need to maintain a global controller epoch?

                
> Multiple controllers can co-exist during soft failures
> ------------------------------------------------------
>
>                 Key: KAFKA-532
>                 URL: https://issues.apache.org/jira/browse/KAFKA-532
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Blocker
>              Labels: bugs
>         Attachments: kafka-532-v1.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> If the current controller experiences an intermittent soft failure (GC pause) 
> in the middle of leader election or partition reassignment, a new controller 
> might get elected and start communicating new state change decisions to the 
> brokers. After recovering from the soft failure, the old controller might 
> continue sending some stale state change decisions to the brokers, resulting 
> in unexpected failures. We need to introduce a controller generation id that 
> increments with controller election. The brokers should reject any state 
> change requests by a controller with an older generation id.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to