[ https://issues.apache.org/jira/browse/KAFKA-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467184#comment-13467184 ]
Jun Rao commented on KAFKA-532: ------------------------------- Thanks for patch v1. Some comments: 1. PartitionStateMachine.electLeaderForPartition(): We throw an exception if controllerEpoch != controller.epoch. It seems that we should only throw an exception if controllerEpoch > controller.epoch. The controllerEpoch in the leaderAndIsr path in ZK indicates the epoch of the controller when leader was changed last time. It's ok for a leader not to be changed during multiple generations of the controller. What we need to prevent is that if a newer generation of a controller has already changed the leader, the older generation of the controller can't touch the leaderAndIsr path any more. Also, it's not clear to me if we need the var retry. Ditto in ReplicaStateMachine.handleStateChange(). 2. ZkUtils.getLeaderAndIsrForPartition(): Instead of returning a tuple, could we return a LeaderAndIsrInfo object? 3. ZookeeperLeaderElector: We keep the controller epoch in the value of the controller path. The problem is that the controller path is ephemeral. During a controller failover, the controller path will be gone and we won't be able to obtain the last controller epoch. We will need to store the controller epoch in a separate persistence path. That will probably complicate things a bit more. 4. LeaderAndIsRRequest,StopReplicaRequest sizeInBytes(): When computing size, could we add a comment for each number which field is it for? 5. Partition.updateISR(): If we want to keep the semantics that the controllerEpoch in the leaderIsrPath is the epoch of the controller when the leader is changed, we need to use the controller epoch in the current leaderIsrPath when updating ISR, not the latest controller epoch this broker has seen. 6. ControllerChannelManager.sendRequestToBrokers: For similar reasons as the above, the controller epoch in the LeaderAndIsrRequest may not always be the epoch of the current controller (e.g., when resending the leadership info stored in ZK during controller failover). 7. ReplicaManager: We will nee to maintain a controller epoc per partition. Do we still need to maintain a global controller epoch? > Multiple controllers can co-exist during soft failures > ------------------------------------------------------ > > Key: KAFKA-532 > URL: https://issues.apache.org/jira/browse/KAFKA-532 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8 > Reporter: Neha Narkhede > Assignee: Neha Narkhede > Priority: Blocker > Labels: bugs > Attachments: kafka-532-v1.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > If the current controller experiences an intermittent soft failure (GC pause) > in the middle of leader election or partition reassignment, a new controller > might get elected and start communicating new state change decisions to the > brokers. After recovering from the soft failure, the old controller might > continue sending some stale state change decisions to the brokers, resulting > in unexpected failures. We need to introduce a controller generation id that > increments with controller election. The brokers should reject any state > change requests by a controller with an older generation id. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira