Hi George,

Yes, KIP-455 allows the reassignment of individual partitions to be cancelled.  
I think it's very important for these operations to be at the partition level.

best,
Colin

On Tue, May 14, 2019, at 16:34, George Li wrote:
>  Hi Colin,
> 
> Thanks for the updated KIP.  It has very good improvements of Kafka 
> reassignment operations. 
> 
> One question, looks like the KIP includes the Cancellation of 
> individual pending reassignments as well when the 
> AlterPartitionReasisgnmentRequest has empty replicas for the 
> topic/partition. Will you also be implementing the the partition 
> cancellation/rollback in the PR ?    If yes,  it will make KIP-236 (it 
> has PR already) trivial, since the cancel all pending reassignments, 
> one just needs to do a ListPartitionRessignmentRequest, then submit 
> empty replicas for all those topic/partitions in 
> one AlterPartitionReasisgnmentRequest. 
> 
> 
> Thanks,
> George
> 
>     On Friday, May 10, 2019, 8:44:31 PM PDT, Colin McCabe 
> <cmcc...@apache.org> wrote:  
>  
>  On Fri, May 10, 2019, at 17:34, Colin McCabe wrote:
> > On Fri, May 10, 2019, at 16:43, Jason Gustafson wrote:
> > > Hi Colin,
> > > 
> > > I think storing reassignment state at the partition level is the right 
> > > move
> > > and I also agree that replicas should understand that there is a
> > > reassignment in progress. This makes KIP-352 a trivial follow-up for
> > > example. The only doubt I have is whether the leader and isr znode is the
> > > right place to store the target reassignment. It is a bit odd to keep the
> > > target assignment in a separate place from the current assignment, right? 
> > > I
> > > assume the thinking is probably that although the current assignment 
> > > should
> > > probably be in the leader and isr znode as well, it is hard to move the
> > > state in a compatible way. Is that right? But if we have no plan to remove
> > > the assignment znode, do you see a downside to storing the target
> > > assignment there as well?
> > >
> > 
> > Hi Jason,
> > 
> > That's a good point -- it's probably better to keep the target 
> > assignment in the same znode as the current assignment, for 
> > consistency.  I'll change the KIP.
> 
> Hi Jason,
> 
> Thanks again for the review.
> 
> I took another look at this, and I think we should stick with the 
> initial proposal of putting the reassignment state into 
> /brokers/topics/[topic]/partitions/[partitionId]/state.  The reason is 
> because we'll want to bump the leader epoch for the partition when 
> changing the reassignment state, and the leader epoch resides in that 
> znode anyway.  I agree there is some inconsistency here, but so be it: 
> if we were to greenfield these zookeeper data structures, we might do 
> it differently, but the proposed scheme will work fine and be 
> extensible for the future.
> 
> > 
> > > A few additional questions:
> > > 
> > > 1. Should `alterPartitionReassignments` be `alterPartitionAssignments`?
> > > It's the current assignment we're altering, right?
> > 
> > That's fair.  AlterPartitionAssigments reads a little better, and I'll 
> > change it to that.
> 
> +1.  I've changed the RPC and API name in the wiki.
> 
> > 
> > > 2. Does this change affect the Metadata API? In other words, are clients
> > > aware of reassignments? If so, then we probably need a change to
> > > UpdateMetadata as well. The only alternative I can think of would be to
> > > represent the replica set in the Metadata request as the union of the
> > > current and target replicas, but I can't think of any benefit to hiding
> > > reassignments. Note that if we did this, we probably wouldn't need a
> > > separate API to list reassignments.
> > 
> > I thought about this a bit... and I think on balance, you're right.  We 
> > should keep this information together with the replica nodes, isr 
> > nodes, and offline replicas, and that information is available in the 
> > MetadataResponse. 
> >  However, I do think in order to do this, we'll need a flag in the 
> > MetadataRequest that specifiies "only show me reassigning partitions".  
> > I'll add this.
> 
> I revisited this, and I think we should stick with the original 
> proposal of having a separate ListPartitionReassignments API.  There 
> really is no use case where the Producer or Consumer needs to know 
> about a reassignment.  They should just be notified when the set of 
> partitions changes, which doesn't require changes to 
> MetadataRequest/Response.  The Admin client only cares if someone is 
> managing the reassignment.  So adding this state to the 
> MetadataResponse adds overhead for no real benefit.  In the common case 
> where there is no ongoing reassignment, it would be 4 bytes per 
> partition of extra overhead in the MetadataResponse.
> 
> In general, I think we have a problem of oversharing in the 
> MetadataRequest/Response.  As we 10x or 100x the number of partitions 
> we support, we'll need to get stricter about giving clients only the 
> information they actually need, about the partitions they actually care 
> about.  Reassignment state clearly falls in the category of state that 
> isn't needed by clients (except very specialized rebalancing programs).
> 
> Another important consideration here is that someone managing an 
> ongoing reassignment wants the most up-to-date information, which is to 
> be found on the controller.  Therefore adding this state to listTopics 
> or describeTopics, which could contact any node in the cluster, is 
> sub-optimal.
> 
> Finally, adding this to listTopics or describeTopics feels like a warty 
> API.  It's an extra boolean which interacts with other extra booleans 
> like "show internal", etc. in weird ways.  I think a separate API is 
> cleaner.
> 
> > 
> > > 3. As replicas come into sync, they will join the ISR. Will we await all
> > > target replicas joining the ISR before taking the replica out of the 
> > > target
> > > replicas set? Also, I assume that target replicas can still be elected as
> > > leader?
> > 
> > We'll take a replica out of the target replicas set as soon as that 
> > replica is in the ISR.  Let me clarify this in the KIP.
> > 
> > > 4. Probably useful to mention permissions for the new APIs.
> > 
> > Good point.  I think alterPartitionAssignments should require ALTER on 
> > CLUSTER.  MetadataRequest permissions will be unchanged.
> 
> I added permission information.
> 
> best,
> Colin
> 
> > 
> > best,
> > Colin
> > 
> > > 
> > > Thanks,
> > > Jason
> > > 
> > > On Fri, May 10, 2019 at 9:30 AM Gwen Shapira <g...@confluent.io> wrote:
> > > 
> > > > +1 (binding)
> > > > Looks great, and will be awesome to have this new capability.
> > > >
> > > > On Wed, May 8, 2019 at 10:23 PM Colin McCabe <cmcc...@apache.org> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to start the vote for KIP-455: Create an Administrative API 
> > > > > for
> > > > > Replica Reassignment.  I think this KIP is important since it will 
> > > > > unlock
> > > > > many follow-on improvements to Kafka reassignment (see the "Future 
> > > > > work"
> > > > > section, plus a lot of the other discussions we've had recently about
> > > > > reassignment).  It also furthers the important KIP-4 goal of removing
> > > > > direct access to ZK.
> > > > >
> > > > > I made a few changes based on the discussion in the [DISCUSS] thread. 
> > > > >  As
> > > > > Robert suggested, I removed the need to explicitly cancel a 
> > > > > reassignment
> > > > > for a partition before setting up a different reassignment for that
> > > > > specific partition.  I also simplified the API a bit by adding a
> > > > > PartitionReassignment class which is used by both the alter and list
> > > > APIs.
> > > > >
> > > > > I modified the proposal so that we now deprecate the old znode-based 
> > > > > API
> > > > > rather than removing it completely.  That should give external
> > > > rebalancing
> > > > > tools some time to transition to the new API.
> > > > >
> > > > > To clarify a question Viktor asked, I added a note that the
> > > > > kafka-reassign-partitions.sh will now use a --bootstrap-server 
> > > > > argument
> > > > to
> > > > > contact the admin APIs.
> > > > >
> > > > > thanks,
> > > > > Colin
> > > > >
> > > >
> > > >
> > > > --
> > > > *Gwen Shapira*
> > > > Product Manager | Confluent
> > > > 650.450.2760 | @gwenshap
> > > > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
> > > > <http://www.confluent.io/blog>
> > > >
> > >
> >
>

Reply via email to