[ https://issues.apache.org/jira/browse/KAFKA-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464224#comment-13464224 ]
Jun Rao commented on KAFKA-42: ------------------------------ Thanks for patch v1. Looks good overall. Some comments: 1. ReassignPartitionsCommand: 1.1 If a partition doesn't exist, we should fail the operation immediately without updating ReassignPartitionsPath. 1.2 I think it would be useful to support migrating multiple topics and partitions. We can just take a JSON file that describes the new replicas as input. 1.3 If ReassignPartitionsPath already exists, we should quit immediately and not overwrite the path. This means that we will only allow 1 outstanding cluster rebalance at a given point of time, which is ok as long as the admin command allows multiple topic/partition being specified. 2. Currently, we fail the partition reassignment operation if any broker in RAR is down, during initialization. However, brokers in RAR can go down after initialization. So, it would be good if we can handle RAR failures. Probably the only change needed is that when a broker is online, we need to start those replicas in RAR too. 3. The logic is now get more complicated with the reassginment logic. Could we describe how it works in a comment? 4. PartitionLeaderSelector.selectLeader(): describe what the return value is in the comment. 5. ReassignedPartitionsIsrChangeListener.handleDataChange(): The following statement is weird. Only controller can change leaders and the controller always updates the leader cache every time a leader is changed. So, there shouldn't be a need for updating the leader cache on ZK listeners. controllerContext.allLeaders.put((topic, partition), leaderAndIsr.leader) 6. KafkaController.onPartitionReassignment(): Could we put the logic that makes sure all replicas RAR are is ISR in onPartitionReassignment()? Currently, that logic is duplicated in 2-3 places and that logic is always followed by a call to onPartitionReassignment(). If we do this, do we still need ReassignedPartitionsContext.areNewReplicasInIsr? 7. ReplicaStateChangeMachine: 7.1 NonExistentReplica: The controller holds on to all replicas in this state. Is this necessary? Can we just remove them from the replicaState map. 7.2 In the following code, we don't really need to read from ZK and can use the cached data. case _ => // check if the leader for this partition is alive or even exists // NOTE: technically, we could get the leader from the allLeaders cache, but we need to read zookeeper // for the ISR anyways val leaderAndIsrOpt = ZkUtils.getLeaderAndIsrForPartition(zkClient, topic, partition) 8. AdminTest: 8.1 testPartitionReassignmentWithLeaderInNewReplicas: How do we make sure that replica 0 is always the leader? 8.2 testResumePartitionReassignmentThatWasCompleted: Towards the end, the comment says leader should be 2, but there is no broker 2 in the test. 9. ControllerBrokerRequestBatch: Should we rename the two addRequestForBrokers to addLeaderAndIsrRequestForBrokers and addStopReplicaRequestForBrokers respectively? 10. PartitionOfflineException,StateChangeFailedException: We can probably change the implementation to use RuntimeException(message, throwable) directly. 11. LeaderElectionTest.testLeaderElectionAndEpoch(): Not sure if the change is correct. If there is no leadership change, leader epoch shouldn't change, right? > Support rebalancing the partitions with replication > --------------------------------------------------- > > Key: KAFKA-42 > URL: https://issues.apache.org/jira/browse/KAFKA-42 > Project: Kafka > Issue Type: Bug > Components: core > Reporter: Jun Rao > Assignee: Neha Narkhede > Priority: Blocker > Labels: features > Fix For: 0.8 > > Attachments: kafka-42-v1.patch > > Original Estimate: 240h > Remaining Estimate: 240h > > As new brokers are added, we need to support moving partition replicas from > one set of brokers to another, online. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira