Hello Justine,

Thanks for the KIP!

I happen to have been confronted recently with the need to keep track of a 
large number of topics as compactly as possible. I was going to come up with 
some way to dictionary encode the topic names as integers, but this seems much 
better!

Apologies if this has been raised before, but I’m wondering about the choice of 
UUID vs sequence number for the ids. Typically, I’ve seen UUIDs in two 
situations:
1. When processes need to generate non-colliding identifiers without 
coordination. 
2. When the identifier needs to be “universally unique”; I.e., the identifier 
needs to distinguish the entity from all other entities that could ever exist. 
This is useful in cases where entities from all kinds of systems get mixed 
together, such as when dumping logs from all processes in a company into a 
common system. 

Maybe I’m being short-sighted, but it doesn’t seem like either really applies 
here. It seems like the brokers could and would achieve consensus when creating 
a topic anyway, which is all that’s required to generate non-colliding sequence 
ids. For the second, as you mention, we could assign a UUID to the cluster as a 
whole, which would render any resource scoped to the broker universally unique 
as well. 

The reason I mention this is that, although a UUID is way more compact than 
topic names, it’s still 16 bytes. In contrast, a 4-byte integer sequence id 
would give us 4 billion unique topics per cluster, which seems like enough ;)

Considering the number of different times these topic identifiers are sent over 
the wire or stored in memory, it seems like it might be worth the additional 4x 
space savings. 

What do you think about this?

Thanks,
John

On Fri, Sep 11, 2020, at 03:20, Tom Bentley wrote:
> Hi Justine,
> 
> This looks like a very welcome improvement. Thanks!
> 
> Maybe I missed it, but the KIP doesn't seem to mention changing
> DeleteTopicsRequest to identify the topic using an id. Maybe that's out of
> scope, but DeleteTopicsRequest is not listed among the Future Work APIs
> either.
> 
> Kind regards,
> 
> Tom
> 
> On Thu, Sep 10, 2020 at 3:59 PM Satish Duggana <satish.dugg...@gmail.com>
> wrote:
> 
> > Thanks Lucas/Justine for the nice KIP.
> >
> > It has several benefits which also include simplifying the topic
> > deletion process by controller and logs cleanup by brokers in corner
> > cases.
> >
> > Best,
> > Satish.
> >
> > On Wed, Sep 9, 2020 at 10:07 PM Justine Olshan <jols...@confluent.io>
> > wrote:
> > >
> > > Hello all, it's been almost a year! I've made some changes to this KIP
> > and hope to continue the discussion.
> > >
> > > One of the main changes I've added is now the metadata response will
> > include the topic ID (as Colin suggested). Clients can obtain the topicID
> > of a given topic through a TopicDescription. The topicId will also be
> > included with the UpdateMetadata request.
> > >
> > > Let me know what you all think.
> > > Thank you,
> > > Justine
> > >
> > > On 2019/09/13 16:38:26, "Colin McCabe" <cmcc...@apache.org> wrote:
> > > > Hi Lucas,
> > > >
> > > > Thanks for tackling this.  Topic IDs are a great idea, and this is a
> > really good writeup.
> > > >
> > > > For /brokers/topics/[topic], the schema version should be bumped to
> > version 3, rather than 2.  KIP-455 bumped the version of this znode to 2
> > already :)
> > > >
> > > > Given that we're going to be seeing these things as strings as lot (in
> > logs, in ZooKeeper, on the command-line, etc.), does it make sense to use
> > base64 when converting them to strings?
> > > >
> > > > Here is an example of the hex representation:
> > > > 6fcb514b-b878-4c9d-95b7-8dc3a7ce6fd8
> > > >
> > > > And here is an example in base64.
> > > > b8tRS7h4TJ2Vt43Dp85v2A
> > > >
> > > > The base64 version saves 15 letters (to be fair, 4 of those were
> > dashes that we could have elided in the hex representation.)
> > > >
> > > > Another thing to consider is that we should specify that the
> > all-zeroes UUID is not a valid topic UUID.   We can't use null for this
> > because we can't pass a null UUID over the RPC protocol (there is no
> > special pattern for null, nor do we want to waste space reserving such a
> > pattern.)
> > > >
> > > > Maybe I missed it, but did you describe "migration of... existing
> > topic[s] without topic IDs" in detail in any section?  It seems like when
> > the new controller becomes active, it should just generate random UUIDs for
> > these, and write the random UUIDs back to ZooKeeper.  It would be good to
> > spell that out.  We should make it clear that this happens regardless of
> > the inter-broker protocol version (it's a compatible change).
> > > >
> > > > "LeaderAndIsrRequests including an is_every_partition flag" seems a
> > bit wordy.  Can we just call these "full LeaderAndIsrRequests"?  Then the
> > RPC field could be named "full".  Also, it would probably be better for the
> > RPC field to be an enum of { UNSPECIFIED, INCREMENTAL, FULL }, so that we
> > can cleanly handle old versions (by treating them as UNSPECIFIED)
> > > >
> > > > In the LeaderAndIsrRequest section, you write "A final deletion event
> > will be secheduled for X ms after the LeaderAndIsrRequest was first
> > received..."  I guess the X was a placeholder that you intended to replace
> > before posting? :)  In any case, this seems like the kind of thing we'd
> > want a configuration for.  Let's describe that configuration key somewhere
> > in this KIP, including what its default value is.
> > > >
> > > > We should probably also log a bunch of messages at WARN level when
> > something is scheduled for deletion, as well.  (Maybe this was assumed, but
> > it would be good to mention it).
> > > >
> > > > I feel like there are a few sections that should be moved to "rejected
> > alternatives."  For example, in the DeleteTopics section, since we're not
> > going to do option 1 or 2, these should be moved into "rejected
> > alternatives,"  rather than appearing inline.  Another case is the "Should
> > we remove topic name from the protocol where possible" section.  This is
> > clearly discussing a design alternative that we're not proposing to
> > implement: removing the topic name from those protocols.
> > > >
> > > > Is it really necessary to have a new /admin/delete_topics_by_id path
> > in ZooKeeper?  It seems like we don't really need this.  Whenever there is
> > a new controller, we'll send out full LeaderAndIsrRequests which will
> > trigger the stale topics to be cleaned up.   The active controller will
> > also send the full LeaderAndIsrRequest to brokers that are just starting
> > up.    So we don't really need this kind of two-phase commit (send out
> > StopReplicasRequest, get ACKs from all nodes, commit by removing
> > /admin/delete_topics node) any more.
> > > >
> > > > You mention that FetchRequest will now include UUID to avoid issues
> > where requests are made to stale partitions.  However, adding a UUID to
> > MetadataRequest is listed as future work, out of scope for this KIP.  How
> > will the client learn what the topic UUID is, if the metadata response
> > doesn't include that information?  It seems like adding the UUID to
> > MetadataResponse would be an improvement here that might not be too hard to
> > make.
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Mon, Sep 9, 2019, at 17:48, Ryanne Dolan wrote:
> > > > > Lucas, this would be great. I've run into issues with topics being
> > > > > resurrected accidentally, since a client cannot easily distinguish
> > between
> > > > > a deleted topic and a new topic with the same name. I'd need the ID
> > > > > accessible from the client to solve that issue, but this is a good
> > first
> > > > > step.
> > > > >
> > > > > Ryanne
> > > > >
> > > > > On Wed, Sep 4, 2019 at 1:41 PM Lucas Bradstreet <lu...@confluent.io>
> > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I would like to kick off discussion of KIP-516, an implementation
> > of topic
> > > > > > IDs for Kafka. Topic IDs aim to solve topic uniqueness problems in
> > Kafka,
> > > > > > where referring to a topic by name alone is insufficient. Such
> > cases
> > > > > > include when a topic has been deleted and recreated with the same
> > name.
> > > > > >
> > > > > > Unique identifiers will help simplify and improve Kafka's topic
> > deletion
> > > > > > process, as well as prevent cases where brokers may incorrectly
> > interact
> > > > > > with stale versions of topics.
> > > > > >
> > > > > >
> > > > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers
> > > > > >
> > > > > > Looking forward to your thoughts.
> > > > > >
> > > > > > Lucas
> > > > > >
> > > > >
> > > >
> >
> >
>

Reply via email to