[ https://issues.apache.org/jira/browse/KAFKA-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412975#comment-13412975 ]
Jun Rao commented on KAFKA-343: ------------------------------- Thanks for patch v2. It's in a much better shape now. Some new comments: 20. KafkaController: 20.1 tryToBecomeController(): It seems that we don't need to first call ZkUtils.readDataMaybeNull and check if the controller exists or not. Instead, we can just call ZkUtils.createEphemeralPathExpectConflict directly. 20.2 allTopicPartitionAssignment and allPartitionReplicaAssignment are representing the same data in a slightly different form. Can we just keep one of them? 20.3 leaderAndISRRecovery(): We need to turn on the Init flag for LeaderAndISRRequest since this method is the first request sent during controller failover. Also, instead of including all partitions in the request, we should only include partitions assigned to each broker. 20.4 initLeaders(): Instead of sending the same leaderAndISRRequest to each broker, we should only send to a broker partitions assigned to it. 20.5 onBrokerChanges: 20.5.1 It seems that the logic for handling new brokers in the same as leaderAndISRRecovery() and we can just reuse the logic. 20.5.2 liveBrokerIds is allBrokerIds. 20.5.3 Instead of trying to elect the leader of all partitions from ZK, the controller should cache the current leader of each partition and only try to elect the leader for partitions whose current leader is no long alive. This will save the # of ZK reads during broker failure. 20.5.4 similar to initLeaders(), instead of sending the same leaderAndISRRequest to each broker, we should only send to a broker partitions assigned to it. 21. BrokerChangeListener: 21.1 handleChildChange(): should we remove the TODO comment? 21.2 handleDeletedTopics(): similar to initLeaders(), instead of sending the same StopReplicaRequest to each broker, we should only send to a broker partitions assigned to it. 22. KafkaApis.handleLeaderAndISRRequest(): If the IsInit flag is on, we should just call stopReplicaCbk to remove partitions that are to be deleted, instead of rewriting the logic already in stopRelicaCbk. 23. KafkaZookeeper: remove unused imports 24. LeaderAndISR: Just to be consistent with ZK versioning, should initialLeaderEpoc start from 0? 25. Log.deleteWholeLog() is not needed since there is already LogManager.deleteLog 26. Replica: isLeader() is not used. 27. ReplicaManager.maybeShrinkISR(): fix the indentation of the closing bracket 28. StopReplicaRequest: DefaultAckTimeout: 1000 ms seems too long. How about 100ms? 29. FetcherTest: tearDown(): there is no need to call fetcher.stopAllConnections() since fetcher.shutdown() does that already. 30. LeaderElectionTest.testLeaderElectionAndEpoch(): instead of adding sleep, could we change waitUntilLeaderIsElected so that it waits until the leader is a live broker. If you feel this is better handled in a separate jira, that fine too. Just create a new jira and provide enough details there. > revisit the become leader and become follower state change operations using > V3 design > ------------------------------------------------------------------------------------- > > Key: KAFKA-343 > URL: https://issues.apache.org/jira/browse/KAFKA-343 > Project: Kafka > Issue Type: Sub-task > Components: core > Affects Versions: 0.8 > Reporter: Jun Rao > Assignee: Yang Ye > Fix For: 0.8 > > Attachments: kafka_343.diff.2, kafka_343.patch > > > We need to reimplement become leader/follower using the controller model > described in > https://cwiki.apache.org/confluence/display/KAFKA/kafka+Detailed+Replication+Design+V3 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira