Hi Mickael,

We don't have any official way for brokers to join the cluster other than 
showing up and registering themselves in ZK.  Similarly, we don't have any way 
of removing brokers from the cluster other than simply removing them and 
removing their znodes from ZooKeeper.

If we wanted to change this, it seems like it would be a really big step.  We 
would need public, stable APIs for both of these things.  Or at least for the 
removal thing, which is currently automatic and doesn't require any action on 
the part of the administrator. Administrators would have to be retrained to do 
this whenever shrinking the cluster. 
 We cannot tell people to modify ZK directly for this.

To be honest, I don't think reworking broker registration is worth it for this 
change.  I think we could pretty easily have placeholder values for the missing 
replicas like -1, -2, -3, etc. and just fill them in whenever a new broker 
comes online.  This may be slightly more complex to implement, but it greatly 
simplifies what users have to do.

It is true that filling in -1, -2, -3, etc. will not preserve rack placement 
information.  But this is kind of a more general problem that we should 
probably solve separately.  After placement, a lot of placement information 
disappears and is not accessible to reassignment.  Since reassignment is 
becoming more and more important, we should make an effort to preserve this 
information.  Since that would be a big change, it's probably best to do 
separately, however.

The "rejected alternatives" section says that adding an option to 
CreateTopicsRequest to allow users to opt-in to the new behavior "felt too 
complex."  But I think this could use a little clarification.  Adding a new 
boolean to the createTopics command is actually fairly simple from the 
perspective of a developer.  But it adds another thing for end-users to think 
about when using the software.  It's also not clear how many users would take 
advantage of this.  I think that's the reason people were not in favor of it, 
not a general feeling of complexity.  Adding more configuration options is 
often simple to implement, and making things "just work" is often a little more 
complex.  But we should prefer the latter, most of the time at least.  I think 
this is what you meant here, but it would be good to clarify.

"Rejected alternatives" also talks about an error code and an error message 
when the replication is not up to full strength.  But this was removed, right?  
We should clarify that no error code is returned in this case, and the 
CreateTopicsResponse returns the true number of replicas that was created, in 
case the client is interested in this information.  Returning an error code 
would certainly cause problems for a lot of users, who use all().get() to 
verify that all the topics have been successfully created.

best,
Colin


On Mon, Oct 21, 2019, at 09:50, Mickael Maison wrote:
> Thanks Stanislav and Colin for the feedback.
> 
> I've updated the KIP to make it simpler.
> It's not updating the CreateTopics/CreatePartitions RPCs anymore. I've
> kept the broker setting so admins can keep the current behaviour but
> simplified it to be either enabled or disabled.
> 
> I've also kept the observed_brokers nodes in Zookeeper. I can't think
> of a better alternative to keep track of the expected brokers. The
> other option would be to perform the extra replica creation
> asynchronously (driven by the controller when a broker joins the
> cluster) but that feels a lot more complicated for this specific use
> case.
> 
> I've also made it explicit that at least "min.insync.replicas" brokers
> have to be online to allow topic/partition creation.
> 
> Thanks
> 
> On Mon, Mar 25, 2019 at 1:17 PM Mickael Maison <mickael.mai...@gmail.com> 
> wrote:
> >
> > Thanks Colin for the feedback.
> >
> > The idea was to allow both users and administrator to decide if they
> > wanted to opt-in and if so under what conditions.
> >
> > Maybe we could do something simpler and just allow the creation if at
> > least min-in-sync replicas are available? That should not require
> > changes to the protocol and while this might not cover all possible
> > use cases, that would still cover the use cases we've listed in the
> > KIP. That would also tie in with existing semantics/guarantees
> > (min-in-sync).
> >
> > Thanks
> >
> > On Tue, Feb 26, 2019 at 5:40 PM Colin McCabe <cmcc...@apache.org> wrote:
> > >
> > > Hi Mickael,
> > >
> > > I don't think adding CREATED_UNDER_REPLICATED as an error code makes 
> > > sense.  It is not an error condition, as described here.
> > >
> > > > Updates to the Decommissioning brokers section in the documentation
> > > > will mention that if a broker id is never to be reused then its 
> > > > corresponding node in zookeeper
> > > > /brokers/observed_ids will need to be removed manually
> > >
> > > I don't think it's acceptable to ask admins to manually modify ZooKeeper 
> > > here.  In general the ZK changes seem kind of like a hack -- perhaps we 
> > > should drop it from the proposal for now.
> > >
> > > Perhaps we could even somehow do all of this in a custom 
> > > CreateTopicPolicy?  That would avoid the need for RPC changes, new 
> > > configuration knobs, etc.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Tue, Dec 18, 2018, at 08:43, Mickael Maison wrote:
> > > > Hi,
> > > >
> > > > We have submitted a KIP to handle topics and partitions creation when
> > > > a cluster is not fully available:
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-409%3A+Allow+creating+under-replicated+topics+and+partitions
> > > >
> > > > As always, we welcome feedback and suggestions.
> > > >
> > > > Thanks
> > > > Mickael and Edoardo
> > > >
>

Reply via email to