[
https://issues.apache.org/jira/browse/KAFKA-42?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Neha Narkhede updated KAFKA-42:
-------------------------------
Attachment: kafka-42-v2.patch
1. ReassignPartitionsCommand:
1.1 Makes sense, changed that.
1.2 I think that makes sense. Thinking about this more, I guess it is not such
a good idea to block the admin command until all the partitions are
successfully reassigned. I changed the reassign partitions admin command to
issue the partition reassignment request if that path doesn't already exist.
This protects accidentally overwriting the zookeeper path. I also added a check
reassignment status admin command that will report if the reassignment status
of a partition is completed/failed/in progress. Also, another thing to be
careful about a batch reassignment API is to avoid piling up important state
change requests on the controller while it reassigns multiple partitions. Since
reassignment of partitions is not an urgent state change, we should give up the
controller lock after each partition is reassigned. That will ensure that other
state changes can sneak in, if necessary
1.3 Yes, forgot to include that in v1 patch.
2. Initially, I thought the admin could just re-run the partition reassignment
command, but I realize that it involes one manual step.
3, 4 Sure
5. Good point, removed it.
6. This check is not done on every single invocation of
onPartitionReassignment, it is done on controller failover and isr change
listener. It is not required to be done when the partition reassigned callback
triggers. But I think it is a good idea to move it to the callback, just in
case we have not covered scenarios when the check should be done.
7.1 While changing the state of a replica to NewReplica, we need to ensure
that it was in the NonExistentReplica state. We can remove the replica from the
replicaState map after it moves to the NonExistentReplica state explicitly, but
there is a chance it will be added back to the map again. This can happen if we
re-start the replica after stopping it. But, since this is infrequent, I made
this change.
7.2 We do not cache the isr which is required for the controller to be able to
send a leader and isr request to the broker
Besides, this operation is only invoked when a new broker is started or
controller fails over. Both of these operations are rare enough that we don't
need to worry about optimizing this.
8.1 There is a very good chance that it will be. This is because, we always
pick the first alive assigned replica as the leader. Since replica 0 is the
first assigned replica and is never shut down during the test, it will be the
leader. Even if, due to some rare zookeeper session expiration issue, it is not
the leader, the test will not fail.
8.2 The comment is redundant there, so I removed it
9, 10. Good point, fixed it
11. It is correct since the controller increments the epoch for isr changes
made by itself.
> Support rebalancing the partitions with replication
> ---------------------------------------------------
>
> Key: KAFKA-42
> URL: https://issues.apache.org/jira/browse/KAFKA-42
> Project: Kafka
> Issue Type: Bug
> Components: core
> Reporter: Jun Rao
> Assignee: Neha Narkhede
> Priority: Blocker
> Labels: features
> Fix For: 0.8
>
> Attachments: kafka-42-v1.patch, kafka-42-v2.patch
>
> Original Estimate: 240h
> Remaining Estimate: 240h
>
> As new brokers are added, we need to support moving partition replicas from
> one set of brokers to another, online.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira