Thanks, Justine!

Your response seems compelling to me.

-John

On Fri, 2020-09-11 at 10:17 -0700, Justine Olshan wrote:
> Hello all,
> Thanks for continuing the discussion! I have a few responses to your points.
> 
> Tom: You are correct in that this KIP has not mentioned the
> DeleteTopicsRequest. I think that this would be out of scope for now, but
> may be something worth adding in the future.
> 
> John: We did consider sequence ids, but there are a few reasons to favor
> UUIDs. There are several cases where topics from different clusters may
> interact now and in the future. For example, Mirror Maker 2 may benefit
> from being able to detect when a cluster being mirrored is deleted and
> recreated and globally unique identifiers would make resolving issues
> easier than sequence IDs which may collide between clusters. KIP-405
> (tiered storage) will also benefit from globally unique IDs as shared
> buckets may be used between clusters.
> 
> Globally unique IDs would also make functionality like moving topics
> between disparate clusters easier in the future, simplify any future
> implementations of backups and restores, and more. In general, unique IDs
> would ensure that the source cluster topics do not conflict with the
> destination cluster topics.
> 
> If we were to use sequence ids, we would need sufficiently large cluster
> ids to be stored with the topic identifiers or we run the risk of
> collisions. This will give up any advantage in compactness that sequence
> numbers may bring. Given these advantages I think it makes sense to use
> UUIDs.
> 
> Gokul: This is an interesting idea, but this is a breaking change. Out of
> scope for now, but maybe worth discussing in the future.
> 
> Hope this explains some of the decisions,
> 
> Justine
> 
> 
> 
> On Fri, Sep 11, 2020 at 8:27 AM Gokul Ramanan Subramanian <
> gokul24...@gmail.com> wrote:
> 
> > Hi.
> > 
> > Thanks for the KIP.
> > 
> > Have you thought about whether it makes sense to support authorizing a
> > principal for a topic ID rather than a topic name to achieve tighter
> > security?
> > 
> > Or is the topic ID fundamentally an internal detail similar to epochs used
> > in a bunch of other places in Kafka?
> > 
> > Thanks.
> > 
> > On Fri, Sep 11, 2020 at 4:06 PM John Roesler <vvcep...@apache.org> wrote:
> > 
> > > Hello Justine,
> > > 
> > > Thanks for the KIP!
> > > 
> > > I happen to have been confronted recently with the need to keep track of
> > a
> > > large number of topics as compactly as possible. I was going to come up
> > > with some way to dictionary encode the topic names as integers, but this
> > > seems much better!
> > > 
> > > Apologies if this has been raised before, but I’m wondering about the
> > > choice of UUID vs sequence number for the ids. Typically, I’ve seen UUIDs
> > > in two situations:
> > > 1. When processes need to generate non-colliding identifiers without
> > > coordination.
> > > 2. When the identifier needs to be “universally unique”; I.e., the
> > > identifier needs to distinguish the entity from all other entities that
> > > could ever exist. This is useful in cases where entities from all kinds
> > of
> > > systems get mixed together, such as when dumping logs from all processes
> > in
> > > a company into a common system.
> > > 
> > > Maybe I’m being short-sighted, but it doesn’t seem like either really
> > > applies here. It seems like the brokers could and would achieve consensus
> > > when creating a topic anyway, which is all that’s required to generate
> > > non-colliding sequence ids. For the second, as you mention, we could
> > assign
> > > a UUID to the cluster as a whole, which would render any resource scoped
> > to
> > > the broker universally unique as well.
> > > 
> > > The reason I mention this is that, although a UUID is way more compact
> > > than topic names, it’s still 16 bytes. In contrast, a 4-byte integer
> > > sequence id would give us 4 billion unique topics per cluster, which
> > seems
> > > like enough ;)
> > > 
> > > Considering the number of different times these topic identifiers are
> > sent
> > > over the wire or stored in memory, it seems like it might be worth the
> > > additional 4x space savings.
> > > 
> > > What do you think about this?
> > > 
> > > Thanks,
> > > John
> > > 
> > > On Fri, Sep 11, 2020, at 03:20, Tom Bentley wrote:
> > > > Hi Justine,
> > > > 
> > > > This looks like a very welcome improvement. Thanks!
> > > > 
> > > > Maybe I missed it, but the KIP doesn't seem to mention changing
> > > > DeleteTopicsRequest to identify the topic using an id. Maybe that's out
> > > of
> > > > scope, but DeleteTopicsRequest is not listed among the Future Work APIs
> > > > either.
> > > > 
> > > > Kind regards,
> > > > 
> > > > Tom
> > > > 
> > > > On Thu, Sep 10, 2020 at 3:59 PM Satish Duggana <
> > satish.dugg...@gmail.com
> > > > wrote:
> > > > 
> > > > > Thanks Lucas/Justine for the nice KIP.
> > > > > 
> > > > > It has several benefits which also include simplifying the topic
> > > > > deletion process by controller and logs cleanup by brokers in corner
> > > > > cases.
> > > > > 
> > > > > Best,
> > > > > Satish.
> > > > > 
> > > > > On Wed, Sep 9, 2020 at 10:07 PM Justine Olshan <jols...@confluent.io
> > > > > wrote:
> > > > > > Hello all, it's been almost a year! I've made some changes to this
> > > KIP
> > > > > and hope to continue the discussion.
> > > > > > One of the main changes I've added is now the metadata response
> > will
> > > > > include the topic ID (as Colin suggested). Clients can obtain the
> > > topicID
> > > > > of a given topic through a TopicDescription. The topicId will also be
> > > > > included with the UpdateMetadata request.
> > > > > > Let me know what you all think.
> > > > > > Thank you,
> > > > > > Justine
> > > > > > 
> > > > > > On 2019/09/13 16:38:26, "Colin McCabe" <cmcc...@apache.org> wrote:
> > > > > > > Hi Lucas,
> > > > > > > 
> > > > > > > Thanks for tackling this.  Topic IDs are a great idea, and this
> > is
> > > a
> > > > > really good writeup.
> > > > > > > For /brokers/topics/[topic], the schema version should be bumped
> > to
> > > > > version 3, rather than 2.  KIP-455 bumped the version of this znode
> > to
> > > 2
> > > > > already :)
> > > > > > > Given that we're going to be seeing these things as strings as
> > lot
> > > (in
> > > > > logs, in ZooKeeper, on the command-line, etc.), does it make sense to
> > > use
> > > > > base64 when converting them to strings?
> > > > > > > Here is an example of the hex representation:
> > > > > > > 6fcb514b-b878-4c9d-95b7-8dc3a7ce6fd8
> > > > > > > 
> > > > > > > And here is an example in base64.
> > > > > > > b8tRS7h4TJ2Vt43Dp85v2A
> > > > > > > 
> > > > > > > The base64 version saves 15 letters (to be fair, 4 of those were
> > > > > dashes that we could have elided in the hex representation.)
> > > > > > > Another thing to consider is that we should specify that the
> > > > > all-zeroes UUID is not a valid topic UUID.   We can't use null for
> > this
> > > > > because we can't pass a null UUID over the RPC protocol (there is no
> > > > > special pattern for null, nor do we want to waste space reserving
> > such
> > > a
> > > > > pattern.)
> > > > > > > Maybe I missed it, but did you describe "migration of... existing
> > > > > topic[s] without topic IDs" in detail in any section?  It seems like
> > > when
> > > > > the new controller becomes active, it should just generate random
> > > UUIDs for
> > > > > these, and write the random UUIDs back to ZooKeeper.  It would be
> > good
> > > to
> > > > > spell that out.  We should make it clear that this happens regardless
> > > of
> > > > > the inter-broker protocol version (it's a compatible change).
> > > > > > > "LeaderAndIsrRequests including an is_every_partition flag"
> > seems a
> > > > > bit wordy.  Can we just call these "full LeaderAndIsrRequests"?  Then
> > > the
> > > > > RPC field could be named "full".  Also, it would probably be better
> > > for the
> > > > > RPC field to be an enum of { UNSPECIFIED, INCREMENTAL, FULL }, so
> > that
> > > we
> > > > > can cleanly handle old versions (by treating them as UNSPECIFIED)
> > > > > > > In the LeaderAndIsrRequest section, you write "A final deletion
> > > event
> > > > > will be secheduled for X ms after the LeaderAndIsrRequest was first
> > > > > received..."  I guess the X was a placeholder that you intended to
> > > replace
> > > > > before posting? :)  In any case, this seems like the kind of thing
> > we'd
> > > > > want a configuration for.  Let's describe that configuration key
> > > somewhere
> > > > > in this KIP, including what its default value is.
> > > > > > > We should probably also log a bunch of messages at WARN level
> > when
> > > > > something is scheduled for deletion, as well.  (Maybe this was
> > > assumed, but
> > > > > it would be good to mention it).
> > > > > > > I feel like there are a few sections that should be moved to
> > > "rejected
> > > > > alternatives."  For example, in the DeleteTopics section, since we're
> > > not
> > > > > going to do option 1 or 2, these should be moved into "rejected
> > > > > alternatives,"  rather than appearing inline.  Another case is the
> > > "Should
> > > > > we remove topic name from the protocol where possible" section.  This
> > > is
> > > > > clearly discussing a design alternative that we're not proposing to
> > > > > implement: removing the topic name from those protocols.
> > > > > > > Is it really necessary to have a new /admin/delete_topics_by_id
> > > path
> > > > > in ZooKeeper?  It seems like we don't really need this.  Whenever
> > > there is
> > > > > a new controller, we'll send out full LeaderAndIsrRequests which will
> > > > > trigger the stale topics to be cleaned up.   The active controller
> > will
> > > > > also send the full LeaderAndIsrRequest to brokers that are just
> > > starting
> > > > > up.    So we don't really need this kind of two-phase commit (send
> > out
> > > > > StopReplicasRequest, get ACKs from all nodes, commit by removing
> > > > > /admin/delete_topics node) any more.
> > > > > > > You mention that FetchRequest will now include UUID to avoid
> > issues
> > > > > where requests are made to stale partitions.  However, adding a UUID
> > to
> > > > > MetadataRequest is listed as future work, out of scope for this KIP.
> > > How
> > > > > will the client learn what the topic UUID is, if the metadata
> > response
> > > > > doesn't include that information?  It seems like adding the UUID to
> > > > > MetadataResponse would be an improvement here that might not be too
> > > hard to
> > > > > make.
> > > > > > > best,
> > > > > > > Colin
> > > > > > > 
> > > > > > > 
> > > > > > > On Mon, Sep 9, 2019, at 17:48, Ryanne Dolan wrote:
> > > > > > > > Lucas, this would be great. I've run into issues with topics
> > > being
> > > > > > > > resurrected accidentally, since a client cannot easily
> > > distinguish
> > > > > between
> > > > > > > > a deleted topic and a new topic with the same name. I'd need
> > the
> > > ID
> > > > > > > > accessible from the client to solve that issue, but this is a
> > > good
> > > > > first
> > > > > > > > step.
> > > > > > > > 
> > > > > > > > Ryanne
> > > > > > > > 
> > > > > > > > On Wed, Sep 4, 2019 at 1:41 PM Lucas Bradstreet <
> > > lu...@confluent.io>
> > > > > wrote:
> > > > > > > > > Hi all,
> > > > > > > > > 
> > > > > > > > > I would like to kick off discussion of KIP-516, an
> > > implementation
> > > > > of topic
> > > > > > > > > IDs for Kafka. Topic IDs aim to solve topic uniqueness
> > > problems in
> > > > > Kafka,
> > > > > > > > > where referring to a topic by name alone is insufficient.
> > Such
> > > > > cases
> > > > > > > > > include when a topic has been deleted and recreated with the
> > > same
> > > > > name.
> > > > > > > > > Unique identifiers will help simplify and improve Kafka's
> > topic
> > > > > deletion
> > > > > > > > > process, as well as prevent cases where brokers may
> > incorrectly
> > > > > interact
> > > > > > > > > with stale versions of topics.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers
> > > > > > > > > Looking forward to your thoughts.
> > > > > > > > > 
> > > > > > > > > Lucas
> > > > > > > > > 

Reply via email to