Thanks, Justine! Your response seems compelling to me.
-John On Fri, 2020-09-11 at 10:17 -0700, Justine Olshan wrote: > Hello all, > Thanks for continuing the discussion! I have a few responses to your points. > > Tom: You are correct in that this KIP has not mentioned the > DeleteTopicsRequest. I think that this would be out of scope for now, but > may be something worth adding in the future. > > John: We did consider sequence ids, but there are a few reasons to favor > UUIDs. There are several cases where topics from different clusters may > interact now and in the future. For example, Mirror Maker 2 may benefit > from being able to detect when a cluster being mirrored is deleted and > recreated and globally unique identifiers would make resolving issues > easier than sequence IDs which may collide between clusters. KIP-405 > (tiered storage) will also benefit from globally unique IDs as shared > buckets may be used between clusters. > > Globally unique IDs would also make functionality like moving topics > between disparate clusters easier in the future, simplify any future > implementations of backups and restores, and more. In general, unique IDs > would ensure that the source cluster topics do not conflict with the > destination cluster topics. > > If we were to use sequence ids, we would need sufficiently large cluster > ids to be stored with the topic identifiers or we run the risk of > collisions. This will give up any advantage in compactness that sequence > numbers may bring. Given these advantages I think it makes sense to use > UUIDs. > > Gokul: This is an interesting idea, but this is a breaking change. Out of > scope for now, but maybe worth discussing in the future. > > Hope this explains some of the decisions, > > Justine > > > > On Fri, Sep 11, 2020 at 8:27 AM Gokul Ramanan Subramanian < > gokul24...@gmail.com> wrote: > > > Hi. > > > > Thanks for the KIP. > > > > Have you thought about whether it makes sense to support authorizing a > > principal for a topic ID rather than a topic name to achieve tighter > > security? > > > > Or is the topic ID fundamentally an internal detail similar to epochs used > > in a bunch of other places in Kafka? > > > > Thanks. > > > > On Fri, Sep 11, 2020 at 4:06 PM John Roesler <vvcep...@apache.org> wrote: > > > > > Hello Justine, > > > > > > Thanks for the KIP! > > > > > > I happen to have been confronted recently with the need to keep track of > > a > > > large number of topics as compactly as possible. I was going to come up > > > with some way to dictionary encode the topic names as integers, but this > > > seems much better! > > > > > > Apologies if this has been raised before, but I’m wondering about the > > > choice of UUID vs sequence number for the ids. Typically, I’ve seen UUIDs > > > in two situations: > > > 1. When processes need to generate non-colliding identifiers without > > > coordination. > > > 2. When the identifier needs to be “universally unique”; I.e., the > > > identifier needs to distinguish the entity from all other entities that > > > could ever exist. This is useful in cases where entities from all kinds > > of > > > systems get mixed together, such as when dumping logs from all processes > > in > > > a company into a common system. > > > > > > Maybe I’m being short-sighted, but it doesn’t seem like either really > > > applies here. It seems like the brokers could and would achieve consensus > > > when creating a topic anyway, which is all that’s required to generate > > > non-colliding sequence ids. For the second, as you mention, we could > > assign > > > a UUID to the cluster as a whole, which would render any resource scoped > > to > > > the broker universally unique as well. > > > > > > The reason I mention this is that, although a UUID is way more compact > > > than topic names, it’s still 16 bytes. In contrast, a 4-byte integer > > > sequence id would give us 4 billion unique topics per cluster, which > > seems > > > like enough ;) > > > > > > Considering the number of different times these topic identifiers are > > sent > > > over the wire or stored in memory, it seems like it might be worth the > > > additional 4x space savings. > > > > > > What do you think about this? > > > > > > Thanks, > > > John > > > > > > On Fri, Sep 11, 2020, at 03:20, Tom Bentley wrote: > > > > Hi Justine, > > > > > > > > This looks like a very welcome improvement. Thanks! > > > > > > > > Maybe I missed it, but the KIP doesn't seem to mention changing > > > > DeleteTopicsRequest to identify the topic using an id. Maybe that's out > > > of > > > > scope, but DeleteTopicsRequest is not listed among the Future Work APIs > > > > either. > > > > > > > > Kind regards, > > > > > > > > Tom > > > > > > > > On Thu, Sep 10, 2020 at 3:59 PM Satish Duggana < > > satish.dugg...@gmail.com > > > > wrote: > > > > > > > > > Thanks Lucas/Justine for the nice KIP. > > > > > > > > > > It has several benefits which also include simplifying the topic > > > > > deletion process by controller and logs cleanup by brokers in corner > > > > > cases. > > > > > > > > > > Best, > > > > > Satish. > > > > > > > > > > On Wed, Sep 9, 2020 at 10:07 PM Justine Olshan <jols...@confluent.io > > > > > wrote: > > > > > > Hello all, it's been almost a year! I've made some changes to this > > > KIP > > > > > and hope to continue the discussion. > > > > > > One of the main changes I've added is now the metadata response > > will > > > > > include the topic ID (as Colin suggested). Clients can obtain the > > > topicID > > > > > of a given topic through a TopicDescription. The topicId will also be > > > > > included with the UpdateMetadata request. > > > > > > Let me know what you all think. > > > > > > Thank you, > > > > > > Justine > > > > > > > > > > > > On 2019/09/13 16:38:26, "Colin McCabe" <cmcc...@apache.org> wrote: > > > > > > > Hi Lucas, > > > > > > > > > > > > > > Thanks for tackling this. Topic IDs are a great idea, and this > > is > > > a > > > > > really good writeup. > > > > > > > For /brokers/topics/[topic], the schema version should be bumped > > to > > > > > version 3, rather than 2. KIP-455 bumped the version of this znode > > to > > > 2 > > > > > already :) > > > > > > > Given that we're going to be seeing these things as strings as > > lot > > > (in > > > > > logs, in ZooKeeper, on the command-line, etc.), does it make sense to > > > use > > > > > base64 when converting them to strings? > > > > > > > Here is an example of the hex representation: > > > > > > > 6fcb514b-b878-4c9d-95b7-8dc3a7ce6fd8 > > > > > > > > > > > > > > And here is an example in base64. > > > > > > > b8tRS7h4TJ2Vt43Dp85v2A > > > > > > > > > > > > > > The base64 version saves 15 letters (to be fair, 4 of those were > > > > > dashes that we could have elided in the hex representation.) > > > > > > > Another thing to consider is that we should specify that the > > > > > all-zeroes UUID is not a valid topic UUID. We can't use null for > > this > > > > > because we can't pass a null UUID over the RPC protocol (there is no > > > > > special pattern for null, nor do we want to waste space reserving > > such > > > a > > > > > pattern.) > > > > > > > Maybe I missed it, but did you describe "migration of... existing > > > > > topic[s] without topic IDs" in detail in any section? It seems like > > > when > > > > > the new controller becomes active, it should just generate random > > > UUIDs for > > > > > these, and write the random UUIDs back to ZooKeeper. It would be > > good > > > to > > > > > spell that out. We should make it clear that this happens regardless > > > of > > > > > the inter-broker protocol version (it's a compatible change). > > > > > > > "LeaderAndIsrRequests including an is_every_partition flag" > > seems a > > > > > bit wordy. Can we just call these "full LeaderAndIsrRequests"? Then > > > the > > > > > RPC field could be named "full". Also, it would probably be better > > > for the > > > > > RPC field to be an enum of { UNSPECIFIED, INCREMENTAL, FULL }, so > > that > > > we > > > > > can cleanly handle old versions (by treating them as UNSPECIFIED) > > > > > > > In the LeaderAndIsrRequest section, you write "A final deletion > > > event > > > > > will be secheduled for X ms after the LeaderAndIsrRequest was first > > > > > received..." I guess the X was a placeholder that you intended to > > > replace > > > > > before posting? :) In any case, this seems like the kind of thing > > we'd > > > > > want a configuration for. Let's describe that configuration key > > > somewhere > > > > > in this KIP, including what its default value is. > > > > > > > We should probably also log a bunch of messages at WARN level > > when > > > > > something is scheduled for deletion, as well. (Maybe this was > > > assumed, but > > > > > it would be good to mention it). > > > > > > > I feel like there are a few sections that should be moved to > > > "rejected > > > > > alternatives." For example, in the DeleteTopics section, since we're > > > not > > > > > going to do option 1 or 2, these should be moved into "rejected > > > > > alternatives," rather than appearing inline. Another case is the > > > "Should > > > > > we remove topic name from the protocol where possible" section. This > > > is > > > > > clearly discussing a design alternative that we're not proposing to > > > > > implement: removing the topic name from those protocols. > > > > > > > Is it really necessary to have a new /admin/delete_topics_by_id > > > path > > > > > in ZooKeeper? It seems like we don't really need this. Whenever > > > there is > > > > > a new controller, we'll send out full LeaderAndIsrRequests which will > > > > > trigger the stale topics to be cleaned up. The active controller > > will > > > > > also send the full LeaderAndIsrRequest to brokers that are just > > > starting > > > > > up. So we don't really need this kind of two-phase commit (send > > out > > > > > StopReplicasRequest, get ACKs from all nodes, commit by removing > > > > > /admin/delete_topics node) any more. > > > > > > > You mention that FetchRequest will now include UUID to avoid > > issues > > > > > where requests are made to stale partitions. However, adding a UUID > > to > > > > > MetadataRequest is listed as future work, out of scope for this KIP. > > > How > > > > > will the client learn what the topic UUID is, if the metadata > > response > > > > > doesn't include that information? It seems like adding the UUID to > > > > > MetadataResponse would be an improvement here that might not be too > > > hard to > > > > > make. > > > > > > > best, > > > > > > > Colin > > > > > > > > > > > > > > > > > > > > > On Mon, Sep 9, 2019, at 17:48, Ryanne Dolan wrote: > > > > > > > > Lucas, this would be great. I've run into issues with topics > > > being > > > > > > > > resurrected accidentally, since a client cannot easily > > > distinguish > > > > > between > > > > > > > > a deleted topic and a new topic with the same name. I'd need > > the > > > ID > > > > > > > > accessible from the client to solve that issue, but this is a > > > good > > > > > first > > > > > > > > step. > > > > > > > > > > > > > > > > Ryanne > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 1:41 PM Lucas Bradstreet < > > > lu...@confluent.io> > > > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > I would like to kick off discussion of KIP-516, an > > > implementation > > > > > of topic > > > > > > > > > IDs for Kafka. Topic IDs aim to solve topic uniqueness > > > problems in > > > > > Kafka, > > > > > > > > > where referring to a topic by name alone is insufficient. > > Such > > > > > cases > > > > > > > > > include when a topic has been deleted and recreated with the > > > same > > > > > name. > > > > > > > > > Unique identifiers will help simplify and improve Kafka's > > topic > > > > > deletion > > > > > > > > > process, as well as prevent cases where brokers may > > incorrectly > > > > > interact > > > > > > > > > with stale versions of topics. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers > > > > > > > > > Looking forward to your thoughts. > > > > > > > > > > > > > > > > > > Lucas > > > > > > > > >