Hi Lucas,

Thanks for tackling this.  Topic IDs are a great idea, and this is a really 
good writeup.

For /brokers/topics/[topic], the schema version should be bumped to version 3, 
rather than 2.  KIP-455 bumped the version of this znode to 2 already :)

Given that we're going to be seeing these things as strings as lot (in logs, in 
ZooKeeper, on the command-line, etc.), does it make sense to use base64 when 
converting them to strings?

Here is an example of the hex representation:
6fcb514b-b878-4c9d-95b7-8dc3a7ce6fd8

And here is an example in base64.
b8tRS7h4TJ2Vt43Dp85v2A

The base64 version saves 15 letters (to be fair, 4 of those were dashes that we 
could have elided in the hex representation.)

Another thing to consider is that we should specify that the all-zeroes UUID is 
not a valid topic UUID.   We can't use null for this because we can't pass a 
null UUID over the RPC protocol (there is no special pattern for null, nor do 
we want to waste space reserving such a pattern.)

Maybe I missed it, but did you describe "migration of... existing topic[s] 
without topic IDs" in detail in any section?  It seems like when the new 
controller becomes active, it should just generate random UUIDs for these, and 
write the random UUIDs back to ZooKeeper.  It would be good to spell that out.  
We should make it clear that this happens regardless of the inter-broker 
protocol version (it's a compatible change).

"LeaderAndIsrRequests including an is_every_partition flag" seems a bit wordy.  
Can we just call these "full LeaderAndIsrRequests"?  Then the RPC field could 
be named "full".  Also, it would probably be better for the RPC field to be an 
enum of { UNSPECIFIED, INCREMENTAL, FULL }, so that we can cleanly handle old 
versions (by treating them as UNSPECIFIED)

In the LeaderAndIsrRequest section, you write "A final deletion event will be 
secheduled for X ms after the LeaderAndIsrRequest was first received..."  I 
guess the X was a placeholder that you intended to replace before posting? :)  
In any case, this seems like the kind of thing we'd want a configuration for.  
Let's describe that configuration key somewhere in this KIP, including what its 
default value is.

We should probably also log a bunch of messages at WARN level when something is 
scheduled for deletion, as well.  (Maybe this was assumed, but it would be good 
to mention it).

I feel like there are a few sections that should be moved to "rejected 
alternatives."  For example, in the DeleteTopics section, since we're not going 
to do option 1 or 2, these should be moved into "rejected alternatives,"  
rather than appearing inline.  Another case is the "Should we remove topic name 
from the protocol where possible" section.  This is clearly discussing a design 
alternative that we're not proposing to implement: removing the topic name from 
those protocols.

Is it really necessary to have a new /admin/delete_topics_by_id path in 
ZooKeeper?  It seems like we don't really need this.  Whenever there is a new 
controller, we'll send out full LeaderAndIsrRequests which will trigger the 
stale topics to be cleaned up.   The active controller will also send the full 
LeaderAndIsrRequest to brokers that are just starting up.    So we don't really 
need this kind of two-phase commit (send out StopReplicasRequest, get ACKs from 
all nodes, commit by removing /admin/delete_topics node) any more.

You mention that FetchRequest will now include UUID to avoid issues where 
requests are made to stale partitions.  However, adding a UUID to 
MetadataRequest is listed as future work, out of scope for this KIP.  How will 
the client learn what the topic UUID is, if the metadata response doesn't 
include that information?  It seems like adding the UUID to MetadataResponse 
would be an improvement here that might not be too hard to make.

best,
Colin


On Mon, Sep 9, 2019, at 17:48, Ryanne Dolan wrote:
> Lucas, this would be great. I've run into issues with topics being
> resurrected accidentally, since a client cannot easily distinguish between
> a deleted topic and a new topic with the same name. I'd need the ID
> accessible from the client to solve that issue, but this is a good first
> step.
> 
> Ryanne
> 
> On Wed, Sep 4, 2019 at 1:41 PM Lucas Bradstreet <lu...@confluent.io> wrote:
> 
> > Hi all,
> >
> > I would like to kick off discussion of KIP-516, an implementation of topic
> > IDs for Kafka. Topic IDs aim to solve topic uniqueness problems in Kafka,
> > where referring to a topic by name alone is insufficient. Such cases
> > include when a topic has been deleted and recreated with the same name.
> >
> > Unique identifiers will help simplify and improve Kafka's topic deletion
> > process, as well as prevent cases where brokers may incorrectly interact
> > with stale versions of topics.
> >
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers
> >
> > Looking forward to your thoughts.
> >
> > Lucas
> >
>

Reply via email to