Hi Tom, Thanks for the comment. I think this is a really good idea and it has been added to the KIP under the newly added tooling section.
Thanks again, Justine On Wed, Sep 23, 2020 at 3:17 AM Tom Bentley <tbent...@redhat.com> wrote: > Hi Justine, > > I know you started the vote thread, but on re-reading the KIP I noticed > that although the topic id is included in the MetadataResponse it's not > surfaced in the output from `kafka-topics.sh --describe`. Maybe that was > intentional because ids are intentionally not really something the user > should care deeply about, but it would also make life harder for anyone > debugging Kafka and this would likely get worse the more topic ids got > rolled out across the protocols, clients etc. It seems likely that > `kafka-topics.sh` will eventually need the ability to show the id of a > topic and perhaps find a topic name given an id. Is there any reason not to > implement that in this KIP? > > Many thanks, > > Tom > > On Mon, Sep 21, 2020 at 9:54 PM Justine Olshan <jols...@confluent.io> > wrote: > > > Hi all, > > > > After thinking about it, I've decided to remove the topic name from the > > Fetch Request and Response after all. Since there are so many of these > > requests per second, it is worth removing the extra information. I've > > updated the KIP to reflect this change. > > > > Please let me know if there is anything else we should discuss before > > voting. > > > > Thank you, > > Justine > > > > On Fri, Sep 18, 2020 at 9:46 AM Justine Olshan <jols...@confluent.io> > > wrote: > > > > > Hi Jun, > > > > > > I see what you are saying. For now we can remove the extra information. > > > I'll leave the option to add more fields to the file in the future. The > > KIP > > > has been updated to reflect this change. > > > > > > Thanks, > > > Justine > > > > > > On Fri, Sep 18, 2020 at 8:46 AM Jun Rao <j...@confluent.io> wrote: > > > > > >> Hi, Justine, > > >> > > >> Thanks for the reply. > > >> > > >> 13. If the log directory is the source of truth, it means that the > > >> redundant info in the metadata file will be ignored. Then the question > > is > > >> why do we need to put the redundant info in the metadata file now? > > >> > > >> Thanks, > > >> > > >> Jun > > >> > > >> On Thu, Sep 17, 2020 at 5:07 PM Justine Olshan <jols...@confluent.io> > > >> wrote: > > >> > > >> > Hi Jun, > > >> > Thanks for the quick response! > > >> > > > >> > 12. I've decided to bump up the versions on the requests and updated > > the > > >> > KIP. I think it's good we thoroughly discussed the options here, so > we > > >> know > > >> > we made a good choice. :) > > >> > > > >> > 13. This is an interesting situation. I think if this does occur we > > >> should > > >> > give a warning. I agree that it's hard to know the source of truth > for > > >> sure > > >> > since the directory or the file could be manually modified. I guess > > the > > >> > directory could be used as the source of truth. To be honest, I'm > not > > >> > really sure what happens in kafka when the log directory is renamed > > >> > manually in such a way. I'm also wondering if the situation is > > >> recoverable > > >> > in this scenario. > > >> > > > >> > Thanks, > > >> > Justine > > >> > > > >> > On Thu, Sep 17, 2020 at 4:28 PM Jun Rao <j...@confluent.io> wrote: > > >> > > > >> > > Hi, Justine, > > >> > > > > >> > > Thanks for the reply. > > >> > > > > >> > > 12. I don't have a strong preference either. However, if we need > IBP > > >> > > anyway, maybe it's easier to just bump up the version for all > inter > > >> > broker > > >> > > requests and add the topic id field as a regular field. A regular > > >> field > > >> > is > > >> > > a bit more concise in wire transfer than a flexible field. > > >> > > > > >> > > 13. The confusion that I was referring to is between the topic > name > > >> and > > >> > > partition number between the log dir and the metadata file. For > > >> example, > > >> > if > > >> > > the log dir is topicA-1 and the metadata file in it has topicB and > > >> > > partition 0 (say due to a bug or manual modification), which one > do > > we > > >> > use > > >> > > as the source of truth? > > >> > > > > >> > > Jun > > >> > > > > >> > > On Thu, Sep 17, 2020 at 3:43 PM Justine Olshan < > > jols...@confluent.io> > > >> > > wrote: > > >> > > > > >> > > > Hi Jun, > > >> > > > Thanks for the comments. > > >> > > > > > >> > > > 12. I bumped the LeaderAndIsrRequest because I removed the topic > > >> name > > >> > > field > > >> > > > in the response. It may be possible to avoid bumping the version > > >> > without > > >> > > > that change, but I may be missing something. > > >> > > > I believe StopReplica is actually on version 3 now, but because > > >> > version 2 > > >> > > > is flexible, I kept that listed as version 2 on the KIP page. > > >> However, > > >> > > you > > >> > > > may be right in that we may need to bump the version on > > StopReplica > > >> to > > >> > > deal > > >> > > > with deletion differently as mentioned above. I don't know if I > > >> have a > > >> > > big > > >> > > > preference over used tagged fields or not. > > >> > > > > > >> > > > 13. I was thinking that in the case where the file and the > request > > >> > topic > > >> > > > ids don't match, it means that the broker's topic/the one in the > > >> file > > >> > has > > >> > > > been deleted. In that case, we would need to delete the old > topic > > >> and > > >> > > start > > >> > > > receiving the new version. If the topic name were to change, but > > the > > >> > ids > > >> > > > still matched, the file would also need to update. Am I missing > a > > >> case > > >> > > > where the file would be correct and not the request? > > >> > > > > > >> > > > Thanks, > > >> > > > Justine > > >> > > > > > >> > > > On Thu, Sep 17, 2020 at 3:18 PM Jun Rao <j...@confluent.io> > wrote: > > >> > > > > > >> > > > > Hi, Justine, > > >> > > > > > > >> > > > > Thanks for the reply. A couple of more comments below. > > >> > > > > > > >> > > > > 12. ListOffset and OffsetForLeader currently don't support > > >> flexible > > >> > > > fields. > > >> > > > > So, we have to bump up the version number and use IBP at least > > for > > >> > > these > > >> > > > > two requests. Note that it seems 2.7.0 will require IBP anyway > > >> > because > > >> > > of > > >> > > > > changes in KAFKA-10435. Also, it seems that the version for > > >> > > > > LeaderAndIsrRequest and StopReplica are bumped even though we > > only > > >> > > added > > >> > > > a > > >> > > > > tagged field. But since IBP is needed anyway, we may want to > > >> revisit > > >> > > the > > >> > > > > overall tagged field choice. > > >> > > > > > > >> > > > > 13. The only downside is the potential confusion on which one > is > > >> the > > >> > > > source > > >> > > > > of truth if they don't match. Another option is to include > those > > >> > fields > > >> > > > in > > >> > > > > the metadata file when we actually change the directory > > structure. > > >> > > > > > > >> > > > > Thanks, > > >> > > > > > > >> > > > > Jun > > >> > > > > > > >> > > > > On Thu, Sep 17, 2020 at 2:01 PM Justine Olshan < > > >> jols...@confluent.io > > >> > > > > >> > > > > wrote: > > >> > > > > > > >> > > > > > Hello all, > > >> > > > > > > > >> > > > > > I've thought some more about removing the topic name field > > from > > >> > some > > >> > > of > > >> > > > > the > > >> > > > > > requests. On closer inspection of the requests/responses, it > > >> seems > > >> > > that > > >> > > > > the > > >> > > > > > internal changes would be much larger than I expected. Some > > >> > protocols > > >> > > > > > involve clients, so they would require changes too. I'm > > thinking > > >> > that > > >> > > > for > > >> > > > > > now, removing the topic name from these requests and > responses > > >> are > > >> > > out > > >> > > > of > > >> > > > > > scope. > > >> > > > > > > > >> > > > > > I have decided to just keep the change LeaderAndIsrResponse > to > > >> > remove > > >> > > > the > > >> > > > > > topic name, and have updated the KIP to reflect this > change. I > > >> have > > >> > > > also > > >> > > > > > mentioned the other requests and responses in future work. > > >> > > > > > > > >> > > > > > I'm hoping to start the voting process soon, so let me know > if > > >> > there > > >> > > is > > >> > > > > > anything else we should discuss. > > >> > > > > > > > >> > > > > > Thank you, > > >> > > > > > Justine > > >> > > > > > > > >> > > > > > On Tue, Sep 15, 2020 at 3:57 PM Justine Olshan < > > >> > jols...@confluent.io > > >> > > > > > >> > > > > > wrote: > > >> > > > > > > > >> > > > > > > Hello again, > > >> > > > > > > To follow up on some of the other comments: > > >> > > > > > > > > >> > > > > > > 10/11) We can remove the topic name from these > > >> > requests/responses, > > >> > > > and > > >> > > > > > > that means we will just have to make a few internal > changes > > to > > >> > make > > >> > > > > > > partitions accessible by topic id and partition. I can > > update > > >> the > > >> > > KIP > > >> > > > > to > > >> > > > > > > remove them unless anyone thinks they should stay. > > >> > > > > > > > > >> > > > > > > 12) Addressed in the previous email. I've updated the KIP > to > > >> > > include > > >> > > > > > > tagged fields for the requests and responses. (More on > that > > >> > below) > > >> > > > > > > > > >> > > > > > > 13) I think part of the idea for including this > information > > >> is to > > >> > > > > prepare > > >> > > > > > > for future changes. Perhaps the directory structure might > > >> change > > >> > > from > > >> > > > > > > topicName_partitionNumber to something like > > >> > > topicID_partitionNumber. > > >> > > > > Then > > >> > > > > > > it would be useful to have the topic name in the file > since > > it > > >> > > would > > >> > > > > not > > >> > > > > > be > > >> > > > > > > in the directory structure. Supporting topic renames might > > be > > >> > > easier > > >> > > > if > > >> > > > > > the > > >> > > > > > > other fields are included. Would there be any downsides to > > >> > > including > > >> > > > > this > > >> > > > > > > information? > > >> > > > > > > > > >> > > > > > > 14) Yes, we would need to copy the partition metadata > file > > in > > >> > this > > >> > > > > > > process. I've updated the KIP to include this. > > >> > > > > > > > > >> > > > > > > 15) I believe Lucas meant v1 and v2 here. He was referring > > to > > >> how > > >> > > the > > >> > > > > > > requests would fall under different IBP and meant that > older > > >> > > brokers > > >> > > > > > would > > >> > > > > > > have to use the older version of the request and the > > existing > > >> > topic > > >> > > > > > > deletion process. At first, it seemed like tagged fields > > would > > >> > > > resolve > > >> > > > > > > the IBP issue. However, we may need IBP for this request > > after > > >> > all > > >> > > > > since > > >> > > > > > > the controller handles the topic deletion differently > > >> depending > > >> > on > > >> > > > the > > >> > > > > > IBP > > >> > > > > > > version. In an older version, we can't just send a > > StopReplica > > >> > > delete > > >> > > > > the > > >> > > > > > > topic immediately like we'd want to for this KIP. > > >> > > > > > > > > >> > > > > > > This makes me wonder if we want tagged fields on all the > > >> requests > > >> > > > after > > >> > > > > > > all. Let me know your thoughts! > > >> > > > > > > > > >> > > > > > > Justine > > >> > > > > > > > > >> > > > > > > On Tue, Sep 15, 2020 at 1:03 PM Justine Olshan < > > >> > > jols...@confluent.io > > >> > > > > > > >> > > > > > > wrote: > > >> > > > > > > > > >> > > > > > >> Hi all, > > >> > > > > > >> Jun brought up a good point in his last email about > tagged > > >> > fields, > > >> > > > and > > >> > > > > > >> I've updated the KIP to reflect that the changes to > > requests > > >> and > > >> > > > > > responses > > >> > > > > > >> will be in the form of tagged fields to avoid changing > IBP. > > >> > > > > > >> > > >> > > > > > >> Jun: I plan on sending a followup email to address some > of > > >> the > > >> > > other > > >> > > > > > >> points. > > >> > > > > > >> > > >> > > > > > >> Thanks, > > >> > > > > > >> Justine > > >> > > > > > >> > > >> > > > > > >> On Mon, Sep 14, 2020 at 4:25 PM Jun Rao < > j...@confluent.io> > > >> > wrote: > > >> > > > > > >> > > >> > > > > > >>> Hi, Justine, > > >> > > > > > >>> > > >> > > > > > >>> Thanks for the updated KIP. A few comments below. > > >> > > > > > >>> > > >> > > > > > >>> 10. LeaderAndIsr Response: Do we need the topic name? > > >> > > > > > >>> > > >> > > > > > >>> 11. For the changed request/response, other than > > >> LeaderAndIsr, > > >> > > > > > >>> UpdateMetadata, Metadata, do we need to include the > topic > > >> name? > > >> > > > > > >>> > > >> > > > > > >>> 12. It seems that upgrades don't require IBP. Does that > > mean > > >> > the > > >> > > > new > > >> > > > > > >>> fields > > >> > > > > > >>> in all the request/response are added as tagged fields > > >> without > > >> > > > > bumping > > >> > > > > > up > > >> > > > > > >>> the request version? It would be useful to make that > > clear. > > >> > > > > > >>> > > >> > > > > > >>> 13. Partition Metadata file: Do we need to include the > > topic > > >> > name > > >> > > > and > > >> > > > > > the > > >> > > > > > >>> partition id since they are implied in the directory > name? > > >> > > > > > >>> > > >> > > > > > >>> 14. In the JBOD mode, we support moving a partition's > data > > >> from > > >> > > one > > >> > > > > > disk > > >> > > > > > >>> to > > >> > > > > > >>> another. Will the new partition metadata file be copied > > >> during > > >> > > that > > >> > > > > > >>> process? > > >> > > > > > >>> > > >> > > > > > >>> 15. The KIP says "Remove deleted topics from replicas by > > >> > sending > > >> > > > > > >>> StopReplicaRequest V2 for any topics which do not > contain > > a > > >> > topic > > >> > > > ID, > > >> > > > > > and > > >> > > > > > >>> V3 for any topics which do contain a topic ID.". > However, > > it > > >> > > seems > > >> > > > > the > > >> > > > > > >>> updated controller will create all missing topic IDs > first > > >> > before > > >> > > > > doing > > >> > > > > > >>> other actions. So, is StopReplicaRequest V2 needed? > > >> > > > > > >>> > > >> > > > > > >>> Jun > > >> > > > > > >>> > > >> > > > > > >>> On Fri, Sep 11, 2020 at 10:31 AM John Roesler < > > >> > > vvcep...@apache.org > > >> > > > > > > >> > > > > > >>> wrote: > > >> > > > > > >>> > > >> > > > > > >>> > Thanks, Justine! > > >> > > > > > >>> > > > >> > > > > > >>> > Your response seems compelling to me. > > >> > > > > > >>> > > > >> > > > > > >>> > -John > > >> > > > > > >>> > > > >> > > > > > >>> > On Fri, 2020-09-11 at 10:17 -0700, Justine Olshan > wrote: > > >> > > > > > >>> > > Hello all, > > >> > > > > > >>> > > Thanks for continuing the discussion! I have a few > > >> > responses > > >> > > to > > >> > > > > > your > > >> > > > > > >>> > points. > > >> > > > > > >>> > > > > >> > > > > > >>> > > Tom: You are correct in that this KIP has not > > mentioned > > >> the > > >> > > > > > >>> > > DeleteTopicsRequest. I think that this would be out > of > > >> > scope > > >> > > > for > > >> > > > > > >>> now, but > > >> > > > > > >>> > > may be something worth adding in the future. > > >> > > > > > >>> > > > > >> > > > > > >>> > > John: We did consider sequence ids, but there are a > > few > > >> > > reasons > > >> > > > > to > > >> > > > > > >>> favor > > >> > > > > > >>> > > UUIDs. There are several cases where topics from > > >> different > > >> > > > > clusters > > >> > > > > > >>> may > > >> > > > > > >>> > > interact now and in the future. For example, Mirror > > >> Maker 2 > > >> > > may > > >> > > > > > >>> benefit > > >> > > > > > >>> > > from being able to detect when a cluster being > > mirrored > > >> is > > >> > > > > deleted > > >> > > > > > >>> and > > >> > > > > > >>> > > recreated and globally unique identifiers would make > > >> > > resolving > > >> > > > > > issues > > >> > > > > > >>> > > easier than sequence IDs which may collide between > > >> > clusters. > > >> > > > > > KIP-405 > > >> > > > > > >>> > > (tiered storage) will also benefit from globally > > unique > > >> IDs > > >> > > as > > >> > > > > > shared > > >> > > > > > >>> > > buckets may be used between clusters. > > >> > > > > > >>> > > > > >> > > > > > >>> > > Globally unique IDs would also make functionality > like > > >> > moving > > >> > > > > > topics > > >> > > > > > >>> > > between disparate clusters easier in the future, > > >> simplify > > >> > any > > >> > > > > > future > > >> > > > > > >>> > > implementations of backups and restores, and more. > In > > >> > > general, > > >> > > > > > >>> unique IDs > > >> > > > > > >>> > > would ensure that the source cluster topics do not > > >> conflict > > >> > > > with > > >> > > > > > the > > >> > > > > > >>> > > destination cluster topics. > > >> > > > > > >>> > > > > >> > > > > > >>> > > If we were to use sequence ids, we would need > > >> sufficiently > > >> > > > large > > >> > > > > > >>> cluster > > >> > > > > > >>> > > ids to be stored with the topic identifiers or we > run > > >> the > > >> > > risk > > >> > > > of > > >> > > > > > >>> > > collisions. This will give up any advantage in > > >> compactness > > >> > > that > > >> > > > > > >>> sequence > > >> > > > > > >>> > > numbers may bring. Given these advantages I think it > > >> makes > > >> > > > sense > > >> > > > > to > > >> > > > > > >>> use > > >> > > > > > >>> > > UUIDs. > > >> > > > > > >>> > > > > >> > > > > > >>> > > Gokul: This is an interesting idea, but this is a > > >> breaking > > >> > > > > change. > > >> > > > > > >>> Out of > > >> > > > > > >>> > > scope for now, but maybe worth discussing in the > > future. > > >> > > > > > >>> > > > > >> > > > > > >>> > > Hope this explains some of the decisions, > > >> > > > > > >>> > > > > >> > > > > > >>> > > Justine > > >> > > > > > >>> > > > > >> > > > > > >>> > > > > >> > > > > > >>> > > > > >> > > > > > >>> > > On Fri, Sep 11, 2020 at 8:27 AM Gokul Ramanan > > >> Subramanian < > > >> > > > > > >>> > > gokul24...@gmail.com> wrote: > > >> > > > > > >>> > > > > >> > > > > > >>> > > > Hi. > > >> > > > > > >>> > > > > > >> > > > > > >>> > > > Thanks for the KIP. > > >> > > > > > >>> > > > > > >> > > > > > >>> > > > Have you thought about whether it makes sense to > > >> support > > >> > > > > > >>> authorizing a > > >> > > > > > >>> > > > principal for a topic ID rather than a topic name > to > > >> > > achieve > > >> > > > > > >>> tighter > > >> > > > > > >>> > > > security? > > >> > > > > > >>> > > > > > >> > > > > > >>> > > > Or is the topic ID fundamentally an internal > detail > > >> > similar > > >> > > > to > > >> > > > > > >>> epochs > > >> > > > > > >>> > used > > >> > > > > > >>> > > > in a bunch of other places in Kafka? > > >> > > > > > >>> > > > > > >> > > > > > >>> > > > Thanks. > > >> > > > > > >>> > > > > > >> > > > > > >>> > > > On Fri, Sep 11, 2020 at 4:06 PM John Roesler < > > >> > > > > > vvcep...@apache.org> > > >> > > > > > >>> > wrote: > > >> > > > > > >>> > > > > > >> > > > > > >>> > > > > Hello Justine, > > >> > > > > > >>> > > > > > > >> > > > > > >>> > > > > Thanks for the KIP! > > >> > > > > > >>> > > > > > > >> > > > > > >>> > > > > I happen to have been confronted recently with > the > > >> need > > >> > > to > > >> > > > > keep > > >> > > > > > >>> > track of > > >> > > > > > >>> > > > a > > >> > > > > > >>> > > > > large number of topics as compactly as > possible. I > > >> was > > >> > > > going > > >> > > > > to > > >> > > > > > >>> come > > >> > > > > > >>> > up > > >> > > > > > >>> > > > > with some way to dictionary encode the topic > names > > >> as > > >> > > > > integers, > > >> > > > > > >>> but > > >> > > > > > >>> > this > > >> > > > > > >>> > > > > seems much better! > > >> > > > > > >>> > > > > > > >> > > > > > >>> > > > > Apologies if this has been raised before, but > I’m > > >> > > wondering > > >> > > > > > >>> about the > > >> > > > > > >>> > > > > choice of UUID vs sequence number for the ids. > > >> > Typically, > > >> > > > > I’ve > > >> > > > > > >>> seen > > >> > > > > > >>> > UUIDs > > >> > > > > > >>> > > > > in two situations: > > >> > > > > > >>> > > > > 1. When processes need to generate non-colliding > > >> > > > identifiers > > >> > > > > > >>> without > > >> > > > > > >>> > > > > coordination. > > >> > > > > > >>> > > > > 2. When the identifier needs to be “universally > > >> > unique”; > > >> > > > > I.e., > > >> > > > > > >>> the > > >> > > > > > >>> > > > > identifier needs to distinguish the entity from > > all > > >> > other > > >> > > > > > >>> entities > > >> > > > > > >>> > that > > >> > > > > > >>> > > > > could ever exist. This is useful in cases where > > >> > entities > > >> > > > from > > >> > > > > > all > > >> > > > > > >>> > kinds > > >> > > > > > >>> > > > of > > >> > > > > > >>> > > > > systems get mixed together, such as when dumping > > >> logs > > >> > > from > > >> > > > > all > > >> > > > > > >>> > processes > > >> > > > > > >>> > > > in > > >> > > > > > >>> > > > > a company into a common system. > > >> > > > > > >>> > > > > > > >> > > > > > >>> > > > > Maybe I’m being short-sighted, but it doesn’t > seem > > >> like > > >> > > > > either > > >> > > > > > >>> really > > >> > > > > > >>> > > > > applies here. It seems like the brokers could > and > > >> would > > >> > > > > achieve > > >> > > > > > >>> > consensus > > >> > > > > > >>> > > > > when creating a topic anyway, which is all > that’s > > >> > > required > > >> > > > to > > >> > > > > > >>> > generate > > >> > > > > > >>> > > > > non-colliding sequence ids. For the second, as > you > > >> > > mention, > > >> > > > > we > > >> > > > > > >>> could > > >> > > > > > >>> > > > assign > > >> > > > > > >>> > > > > a UUID to the cluster as a whole, which would > > render > > >> > any > > >> > > > > > resource > > >> > > > > > >>> > scoped > > >> > > > > > >>> > > > to > > >> > > > > > >>> > > > > the broker universally unique as well. > > >> > > > > > >>> > > > > > > >> > > > > > >>> > > > > The reason I mention this is that, although a > UUID > > >> is > > >> > way > > >> > > > > more > > >> > > > > > >>> > compact > > >> > > > > > >>> > > > > than topic names, it’s still 16 bytes. In > > contrast, > > >> a > > >> > > > 4-byte > > >> > > > > > >>> integer > > >> > > > > > >>> > > > > sequence id would give us 4 billion unique > topics > > >> per > > >> > > > > cluster, > > >> > > > > > >>> which > > >> > > > > > >>> > > > seems > > >> > > > > > >>> > > > > like enough ;) > > >> > > > > > >>> > > > > > > >> > > > > > >>> > > > > Considering the number of different times these > > >> topic > > >> > > > > > >>> identifiers are > > >> > > > > > >>> > > > sent > > >> > > > > > >>> > > > > over the wire or stored in memory, it seems like > > it > > >> > might > > >> > > > be > > >> > > > > > >>> worth > > >> > > > > > >>> > the > > >> > > > > > >>> > > > > additional 4x space savings. > > >> > > > > > >>> > > > > > > >> > > > > > >>> > > > > What do you think about this? > > >> > > > > > >>> > > > > > > >> > > > > > >>> > > > > Thanks, > > >> > > > > > >>> > > > > John > > >> > > > > > >>> > > > > > > >> > > > > > >>> > > > > On Fri, Sep 11, 2020, at 03:20, Tom Bentley > wrote: > > >> > > > > > >>> > > > > > Hi Justine, > > >> > > > > > >>> > > > > > > > >> > > > > > >>> > > > > > This looks like a very welcome improvement. > > >> Thanks! > > >> > > > > > >>> > > > > > > > >> > > > > > >>> > > > > > Maybe I missed it, but the KIP doesn't seem to > > >> > mention > > >> > > > > > changing > > >> > > > > > >>> > > > > > DeleteTopicsRequest to identify the topic > using > > an > > >> > id. > > >> > > > > Maybe > > >> > > > > > >>> > that's out > > >> > > > > > >>> > > > > of > > >> > > > > > >>> > > > > > scope, but DeleteTopicsRequest is not listed > > among > > >> > the > > >> > > > > Future > > >> > > > > > >>> Work > > >> > > > > > >>> > APIs > > >> > > > > > >>> > > > > > either. > > >> > > > > > >>> > > > > > > > >> > > > > > >>> > > > > > Kind regards, > > >> > > > > > >>> > > > > > > > >> > > > > > >>> > > > > > Tom > > >> > > > > > >>> > > > > > > > >> > > > > > >>> > > > > > On Thu, Sep 10, 2020 at 3:59 PM Satish > Duggana < > > >> > > > > > >>> > > > satish.dugg...@gmail.com > > >> > > > > > >>> > > > > > wrote: > > >> > > > > > >>> > > > > > > > >> > > > > > >>> > > > > > > Thanks Lucas/Justine for the nice KIP. > > >> > > > > > >>> > > > > > > > > >> > > > > > >>> > > > > > > It has several benefits which also include > > >> > > simplifying > > >> > > > > the > > >> > > > > > >>> topic > > >> > > > > > >>> > > > > > > deletion process by controller and logs > > cleanup > > >> by > > >> > > > > brokers > > >> > > > > > in > > >> > > > > > >>> > corner > > >> > > > > > >>> > > > > > > cases. > > >> > > > > > >>> > > > > > > > > >> > > > > > >>> > > > > > > Best, > > >> > > > > > >>> > > > > > > Satish. > > >> > > > > > >>> > > > > > > > > >> > > > > > >>> > > > > > > On Wed, Sep 9, 2020 at 10:07 PM Justine > > Olshan < > > >> > > > > > >>> > jols...@confluent.io > > >> > > > > > >>> > > > > > > wrote: > > >> > > > > > >>> > > > > > > > Hello all, it's been almost a year! I've > > made > > >> > some > > >> > > > > > changes > > >> > > > > > >>> to > > >> > > > > > >>> > this > > >> > > > > > >>> > > > > KIP > > >> > > > > > >>> > > > > > > and hope to continue the discussion. > > >> > > > > > >>> > > > > > > > One of the main changes I've added is now > > the > > >> > > > metadata > > >> > > > > > >>> response > > >> > > > > > >>> > > > will > > >> > > > > > >>> > > > > > > include the topic ID (as Colin suggested). > > >> Clients > > >> > > can > > >> > > > > > >>> obtain the > > >> > > > > > >>> > > > > topicID > > >> > > > > > >>> > > > > > > of a given topic through a TopicDescription. > > The > > >> > > > topicId > > >> > > > > > will > > >> > > > > > >>> > also be > > >> > > > > > >>> > > > > > > included with the UpdateMetadata request. > > >> > > > > > >>> > > > > > > > Let me know what you all think. > > >> > > > > > >>> > > > > > > > Thank you, > > >> > > > > > >>> > > > > > > > Justine > > >> > > > > > >>> > > > > > > > > > >> > > > > > >>> > > > > > > > On 2019/09/13 16:38:26, "Colin McCabe" < > > >> > > > > > cmcc...@apache.org > > >> > > > > > >>> > > > >> > > > > > >>> > wrote: > > >> > > > > > >>> > > > > > > > > Hi Lucas, > > >> > > > > > >>> > > > > > > > > > > >> > > > > > >>> > > > > > > > > Thanks for tackling this. Topic IDs > are a > > >> > great > > >> > > > > idea, > > >> > > > > > >>> and > > >> > > > > > >>> > this > > >> > > > > > >>> > > > is > > >> > > > > > >>> > > > > a > > >> > > > > > >>> > > > > > > really good writeup. > > >> > > > > > >>> > > > > > > > > For /brokers/topics/[topic], the schema > > >> version > > >> > > > > should > > >> > > > > > be > > >> > > > > > >>> > bumped > > >> > > > > > >>> > > > to > > >> > > > > > >>> > > > > > > version 3, rather than 2. KIP-455 bumped > the > > >> > version > > >> > > > of > > >> > > > > > this > > >> > > > > > >>> > znode > > >> > > > > > >>> > > > to > > >> > > > > > >>> > > > > 2 > > >> > > > > > >>> > > > > > > already :) > > >> > > > > > >>> > > > > > > > > Given that we're going to be seeing > these > > >> > things > > >> > > as > > >> > > > > > >>> strings > > >> > > > > > >>> > as > > >> > > > > > >>> > > > lot > > >> > > > > > >>> > > > > (in > > >> > > > > > >>> > > > > > > logs, in ZooKeeper, on the command-line, > > etc.), > > >> > does > > >> > > it > > >> > > > > > make > > >> > > > > > >>> > sense to > > >> > > > > > >>> > > > > use > > >> > > > > > >>> > > > > > > base64 when converting them to strings? > > >> > > > > > >>> > > > > > > > > Here is an example of the hex > > >> representation: > > >> > > > > > >>> > > > > > > > > 6fcb514b-b878-4c9d-95b7-8dc3a7ce6fd8 > > >> > > > > > >>> > > > > > > > > > > >> > > > > > >>> > > > > > > > > And here is an example in base64. > > >> > > > > > >>> > > > > > > > > b8tRS7h4TJ2Vt43Dp85v2A > > >> > > > > > >>> > > > > > > > > > > >> > > > > > >>> > > > > > > > > The base64 version saves 15 letters (to > be > > >> > fair, > > >> > > 4 > > >> > > > of > > >> > > > > > >>> those > > >> > > > > > >>> > were > > >> > > > > > >>> > > > > > > dashes that we could have elided in the hex > > >> > > > > > representation.) > > >> > > > > > >>> > > > > > > > > Another thing to consider is that we > > should > > >> > > specify > > >> > > > > > that > > >> > > > > > >>> the > > >> > > > > > >>> > > > > > > all-zeroes UUID is not a valid topic UUID. > > We > > >> > can't > > >> > > > use > > >> > > > > > >>> null > > >> > > > > > >>> > for > > >> > > > > > >>> > > > this > > >> > > > > > >>> > > > > > > because we can't pass a null UUID over the > RPC > > >> > > protocol > > >> > > > > > >>> (there > > >> > > > > > >>> > is no > > >> > > > > > >>> > > > > > > special pattern for null, nor do we want to > > >> waste > > >> > > space > > >> > > > > > >>> reserving > > >> > > > > > >>> > > > such > > >> > > > > > >>> > > > > a > > >> > > > > > >>> > > > > > > pattern.) > > >> > > > > > >>> > > > > > > > > Maybe I missed it, but did you describe > > >> > > "migration > > >> > > > > > of... > > >> > > > > > >>> > existing > > >> > > > > > >>> > > > > > > topic[s] without topic IDs" in detail in any > > >> > section? > > >> > > > It > > >> > > > > > >>> seems > > >> > > > > > >>> > like > > >> > > > > > >>> > > > > when > > >> > > > > > >>> > > > > > > the new controller becomes active, it should > > >> just > > >> > > > > generate > > >> > > > > > >>> random > > >> > > > > > >>> > > > > UUIDs for > > >> > > > > > >>> > > > > > > these, and write the random UUIDs back to > > >> > ZooKeeper. > > >> > > > It > > >> > > > > > >>> would be > > >> > > > > > >>> > > > good > > >> > > > > > >>> > > > > to > > >> > > > > > >>> > > > > > > spell that out. We should make it clear > that > > >> this > > >> > > > > happens > > >> > > > > > >>> > regardless > > >> > > > > > >>> > > > > of > > >> > > > > > >>> > > > > > > the inter-broker protocol version (it's a > > >> > compatible > > >> > > > > > change). > > >> > > > > > >>> > > > > > > > > "LeaderAndIsrRequests including an > > >> > > > is_every_partition > > >> > > > > > >>> flag" > > >> > > > > > >>> > > > seems a > > >> > > > > > >>> > > > > > > bit wordy. Can we just call these "full > > >> > > > > > >>> LeaderAndIsrRequests"? > > >> > > > > > >>> > Then > > >> > > > > > >>> > > > > the > > >> > > > > > >>> > > > > > > RPC field could be named "full". Also, it > > would > > >> > > > probably > > >> > > > > > be > > >> > > > > > >>> > better > > >> > > > > > >>> > > > > for the > > >> > > > > > >>> > > > > > > RPC field to be an enum of { UNSPECIFIED, > > >> > > INCREMENTAL, > > >> > > > > FULL > > >> > > > > > >>> }, so > > >> > > > > > >>> > > > that > > >> > > > > > >>> > > > > we > > >> > > > > > >>> > > > > > > can cleanly handle old versions (by treating > > >> them > > >> > as > > >> > > > > > >>> UNSPECIFIED) > > >> > > > > > >>> > > > > > > > > In the LeaderAndIsrRequest section, you > > >> write > > >> > "A > > >> > > > > final > > >> > > > > > >>> > deletion > > >> > > > > > >>> > > > > event > > >> > > > > > >>> > > > > > > will be secheduled for X ms after the > > >> > > > LeaderAndIsrRequest > > >> > > > > > was > > >> > > > > > >>> > first > > >> > > > > > >>> > > > > > > received..." I guess the X was a > placeholder > > >> that > > >> > > you > > >> > > > > > >>> intended > > >> > > > > > >>> > to > > >> > > > > > >>> > > > > replace > > >> > > > > > >>> > > > > > > before posting? :) In any case, this seems > > like > > >> > the > > >> > > > kind > > >> > > > > > of > > >> > > > > > >>> > thing > > >> > > > > > >>> > > > we'd > > >> > > > > > >>> > > > > > > want a configuration for. Let's describe > that > > >> > > > > > configuration > > >> > > > > > >>> key > > >> > > > > > >>> > > > > somewhere > > >> > > > > > >>> > > > > > > in this KIP, including what its default > value > > >> is. > > >> > > > > > >>> > > > > > > > > We should probably also log a bunch of > > >> messages > > >> > > at > > >> > > > > WARN > > >> > > > > > >>> level > > >> > > > > > >>> > > > when > > >> > > > > > >>> > > > > > > something is scheduled for deletion, as > well. > > >> > (Maybe > > >> > > > > this > > >> > > > > > >>> was > > >> > > > > > >>> > > > > assumed, but > > >> > > > > > >>> > > > > > > it would be good to mention it). > > >> > > > > > >>> > > > > > > > > I feel like there are a few sections > that > > >> > should > > >> > > be > > >> > > > > > >>> moved to > > >> > > > > > >>> > > > > "rejected > > >> > > > > > >>> > > > > > > alternatives." For example, in the > > DeleteTopics > > >> > > > section, > > >> > > > > > >>> since > > >> > > > > > >>> > we're > > >> > > > > > >>> > > > > not > > >> > > > > > >>> > > > > > > going to do option 1 or 2, these should be > > moved > > >> > into > > >> > > > > > >>> "rejected > > >> > > > > > >>> > > > > > > alternatives," rather than appearing > inline. > > >> > > Another > > >> > > > > case > > >> > > > > > >>> is > > >> > > > > > >>> > the > > >> > > > > > >>> > > > > "Should > > >> > > > > > >>> > > > > > > we remove topic name from the protocol where > > >> > > possible" > > >> > > > > > >>> section. > > >> > > > > > >>> > This > > >> > > > > > >>> > > > > is > > >> > > > > > >>> > > > > > > clearly discussing a design alternative that > > >> we're > > >> > > not > > >> > > > > > >>> proposing > > >> > > > > > >>> > to > > >> > > > > > >>> > > > > > > implement: removing the topic name from > those > > >> > > > protocols. > > >> > > > > > >>> > > > > > > > > Is it really necessary to have a new > > >> > > > > > >>> > /admin/delete_topics_by_id > > >> > > > > > >>> > > > > path > > >> > > > > > >>> > > > > > > in ZooKeeper? It seems like we don't really > > >> need > > >> > > this. > > >> > > > > > >>> Whenever > > >> > > > > > >>> > > > > there is > > >> > > > > > >>> > > > > > > a new controller, we'll send out full > > >> > > > > LeaderAndIsrRequests > > >> > > > > > >>> which > > >> > > > > > >>> > will > > >> > > > > > >>> > > > > > > trigger the stale topics to be cleaned up. > > The > > >> > > active > > >> > > > > > >>> > controller > > >> > > > > > >>> > > > will > > >> > > > > > >>> > > > > > > also send the full LeaderAndIsrRequest to > > >> brokers > > >> > > that > > >> > > > > are > > >> > > > > > >>> just > > >> > > > > > >>> > > > > starting > > >> > > > > > >>> > > > > > > up. So we don't really need this kind of > > >> > two-phase > > >> > > > > > commit > > >> > > > > > >>> > (send > > >> > > > > > >>> > > > out > > >> > > > > > >>> > > > > > > StopReplicasRequest, get ACKs from all > nodes, > > >> > commit > > >> > > by > > >> > > > > > >>> removing > > >> > > > > > >>> > > > > > > /admin/delete_topics node) any more. > > >> > > > > > >>> > > > > > > > > You mention that FetchRequest will now > > >> include > > >> > > UUID > > >> > > > > to > > >> > > > > > >>> avoid > > >> > > > > > >>> > > > issues > > >> > > > > > >>> > > > > > > where requests are made to stale partitions. > > >> > > However, > > >> > > > > > >>> adding a > > >> > > > > > >>> > UUID > > >> > > > > > >>> > > > to > > >> > > > > > >>> > > > > > > MetadataRequest is listed as future work, > out > > of > > >> > > scope > > >> > > > > for > > >> > > > > > >>> this > > >> > > > > > >>> > KIP. > > >> > > > > > >>> > > > > How > > >> > > > > > >>> > > > > > > will the client learn what the topic UUID > is, > > if > > >> > the > > >> > > > > > metadata > > >> > > > > > >>> > > > response > > >> > > > > > >>> > > > > > > doesn't include that information? It seems > > like > > >> > > adding > > >> > > > > the > > >> > > > > > >>> UUID > > >> > > > > > >>> > to > > >> > > > > > >>> > > > > > > MetadataResponse would be an improvement > here > > >> that > > >> > > > might > > >> > > > > > not > > >> > > > > > >>> be > > >> > > > > > >>> > too > > >> > > > > > >>> > > > > hard to > > >> > > > > > >>> > > > > > > make. > > >> > > > > > >>> > > > > > > > > best, > > >> > > > > > >>> > > > > > > > > Colin > > >> > > > > > >>> > > > > > > > > > > >> > > > > > >>> > > > > > > > > > > >> > > > > > >>> > > > > > > > > On Mon, Sep 9, 2019, at 17:48, Ryanne > > Dolan > > >> > > wrote: > > >> > > > > > >>> > > > > > > > > > Lucas, this would be great. I've run > > into > > >> > > issues > > >> > > > > with > > >> > > > > > >>> > topics > > >> > > > > > >>> > > > > being > > >> > > > > > >>> > > > > > > > > > resurrected accidentally, since a > client > > >> > cannot > > >> > > > > > easily > > >> > > > > > >>> > > > > distinguish > > >> > > > > > >>> > > > > > > between > > >> > > > > > >>> > > > > > > > > > a deleted topic and a new topic with > the > > >> same > > >> > > > name. > > >> > > > > > I'd > > >> > > > > > >>> > need > > >> > > > > > >>> > > > the > > >> > > > > > >>> > > > > ID > > >> > > > > > >>> > > > > > > > > > accessible from the client to solve > that > > >> > issue, > > >> > > > but > > >> > > > > > >>> this > > >> > > > > > >>> > is a > > >> > > > > > >>> > > > > good > > >> > > > > > >>> > > > > > > first > > >> > > > > > >>> > > > > > > > > > step. > > >> > > > > > >>> > > > > > > > > > > > >> > > > > > >>> > > > > > > > > > Ryanne > > >> > > > > > >>> > > > > > > > > > > > >> > > > > > >>> > > > > > > > > > On Wed, Sep 4, 2019 at 1:41 PM Lucas > > >> > > Bradstreet < > > >> > > > > > >>> > > > > lu...@confluent.io> > > >> > > > > > >>> > > > > > > wrote: > > >> > > > > > >>> > > > > > > > > > > Hi all, > > >> > > > > > >>> > > > > > > > > > > > > >> > > > > > >>> > > > > > > > > > > I would like to kick off discussion > of > > >> > > KIP-516, > > >> > > > > an > > >> > > > > > >>> > > > > implementation > > >> > > > > > >>> > > > > > > of topic > > >> > > > > > >>> > > > > > > > > > > IDs for Kafka. Topic IDs aim to > solve > > >> topic > > >> > > > > > >>> uniqueness > > >> > > > > > >>> > > > > problems in > > >> > > > > > >>> > > > > > > Kafka, > > >> > > > > > >>> > > > > > > > > > > where referring to a topic by name > > >> alone is > > >> > > > > > >>> insufficient. > > >> > > > > > >>> > > > Such > > >> > > > > > >>> > > > > > > cases > > >> > > > > > >>> > > > > > > > > > > include when a topic has been > deleted > > >> and > > >> > > > > recreated > > >> > > > > > >>> with > > >> > > > > > >>> > the > > >> > > > > > >>> > > > > same > > >> > > > > > >>> > > > > > > name. > > >> > > > > > >>> > > > > > > > > > > Unique identifiers will help > simplify > > >> and > > >> > > > improve > > >> > > > > > >>> Kafka's > > >> > > > > > >>> > > > topic > > >> > > > > > >>> > > > > > > deletion > > >> > > > > > >>> > > > > > > > > > > process, as well as prevent cases > > where > > >> > > brokers > > >> > > > > may > > >> > > > > > >>> > > > incorrectly > > >> > > > > > >>> > > > > > > interact > > >> > > > > > >>> > > > > > > > > > > with stale versions of topics. > > >> > > > > > >>> > > > > > > > > > > > > >> > > > > > >>> > > > > > > > > > > > > >> > > > > > >>> > > > > > > > > > > > > >> > > > > > >>> > > > > > >> > > > > > >>> > > > >> > > > > > >>> > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers > > >> > > > > > >>> > > > > > > > > > > Looking forward to your thoughts. > > >> > > > > > >>> > > > > > > > > > > > > >> > > > > > >>> > > > > > > > > > > Lucas > > >> > > > > > >>> > > > > > > > > > > > > >> > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > > >>> > > >> > > > > > >> > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > >