[ 
https://issues.apache.org/jira/browse/KAFKA-42?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-42:
-------------------------------

    Attachment: kafka-42-v2.patch

1. ReassignPartitionsCommand:
1.1  Makes sense, changed that.

1.2 I think that makes sense. Thinking about this more, I guess it is not such 
a good idea to block the admin command until all the partitions are 
successfully reassigned. I changed the reassign partitions admin command to 
issue the partition reassignment request if that path doesn't already exist. 
This protects accidentally overwriting the zookeeper path. I also added a check 
reassignment status admin command that will report if the reassignment status 
of a partition is completed/failed/in progress. Also, another thing to be 
careful about a batch reassignment API is to avoid piling up important state 
change requests on the controller while it reassigns multiple partitions. Since 
reassignment of partitions is not an urgent state change, we should give up the 
controller lock after each partition is reassigned. That will ensure that other 
state changes can sneak in, if necessary

1.3 Yes, forgot to include that in v1 patch.

2. Initially, I thought the admin could just re-run the partition reassignment 
command, but I realize that it involes one manual step.

3, 4 Sure

5. Good point, removed it.

6. This check is not done on every single invocation of 
onPartitionReassignment, it is done on controller failover and isr change 
listener. It is not required to be done when the partition reassigned callback 
triggers. But I think it is a good idea to move it to the callback, just in 
case we have not covered scenarios when the check should be done.

7.1  While changing the state of a replica to NewReplica, we need to ensure 
that it was in the NonExistentReplica state. We can remove the replica from the 
replicaState map after it moves to the NonExistentReplica state explicitly, but 
there is a chance it will be added back to the map again. This can happen if we 
re-start the replica after stopping it. But, since this is infrequent, I made 
this change.

7.2 We do not cache the isr which is required for the controller to be able to 
send a leader and isr request to the broker
Besides, this operation is only invoked when a new broker is started or 
controller fails over. Both of these operations are rare enough that we don't 
need to worry about optimizing this.


8.1 There is a very good chance that it will be. This is because, we always 
pick the first alive assigned replica as the leader. Since replica 0 is the 
first assigned replica and is never shut down during the test, it will be the 
leader. Even if, due to some rare zookeeper session expiration issue, it is not 
the leader, the test will not fail.

8.2 The comment is redundant there, so I removed it

9, 10. Good point, fixed it

11. It is correct since the controller increments the epoch for isr changes 
made by itself.

                
> Support rebalancing the partitions with replication
> ---------------------------------------------------
>
>                 Key: KAFKA-42
>                 URL: https://issues.apache.org/jira/browse/KAFKA-42
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Jun Rao
>            Assignee: Neha Narkhede
>            Priority: Blocker
>              Labels: features
>             Fix For: 0.8
>
>         Attachments: kafka-42-v1.patch, kafka-42-v2.patch
>
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> As new brokers are added, we need to support moving partition replicas from 
> one set of brokers to another, online.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to