[ https://issues.apache.org/jira/browse/KAFKA-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491566#comment-13491566 ]
Jun Rao commented on KAFKA-532: ------------------------------- Thanks for patch v4. A few more comments: 40. PartitionStateInfo: It seems that we need to send the controllerEpoc associated with this partition. Note that this epoc is different from the controllerEpoc in LeaderAndIsrRequest. The former is the epoc of the controller that last changed the leader or isr and will be used when broker updates the isr. The latter is the epoc of the controller that sends the request and will be used in ReplicaManager to decide which controller's decision to follow. We will need to change the controllerEpoc passed to makeLeader and makeFollower in ReplicaManager accordingly. 41. ReplicaManager: In stopReplicas() and becomeLeaderOrFollower(), it would be better to only update controllerEpoch when it's truly necessary, i.e., the new controllerEpoch is larger than the cached one (not equal). This is because updating a volatile variable is a bit expensive than updating a local variable since the update has to be exposed to other threads. 42. KafkaController: The approach in the new patch works. There are a few corner cases that we need to cover. 42.1. incrementControllerEpoch(): If the controllerEpoc path doesn't exist, we create the path using the initial epoc version without using conditional update. It is possible for 2 controllers to execute this logic simultaneously and both get the initial epoc version. One solution is to make sure the controller epoc path exists during context initialization. Then we can always use conditional update here. 42.2. ControllerContext: We need to initialize controllerEpoc by reading from ZK. We also need to make sure that we subscribe to the controllerEpoc path first and then read its value from ZK for initialization. 42.3. ControllerEpochListener: It's safer to set both the epoc and the ZK version using the value from ZkUtils.readData. 43. ControllerMovedException is missing in the patch > Multiple controllers can co-exist during soft failures > ------------------------------------------------------ > > Key: KAFKA-532 > URL: https://issues.apache.org/jira/browse/KAFKA-532 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8 > Reporter: Neha Narkhede > Assignee: Neha Narkhede > Priority: Blocker > Labels: bugs > Attachments: kafka-532-v1.patch, kafka-532-v2.patch, > kafka-532-v3.patch, kafka-532-v4.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > If the current controller experiences an intermittent soft failure (GC pause) > in the middle of leader election or partition reassignment, a new controller > might get elected and start communicating new state change decisions to the > brokers. After recovering from the soft failure, the old controller might > continue sending some stale state change decisions to the brokers, resulting > in unexpected failures. We need to introduce a controller generation id that > increments with controller election. The brokers should reject any state > change requests by a controller with an older generation id. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira