Hi.

Thanks for the KIP.

Have you thought about whether it makes sense to support authorizing a
principal for a topic ID rather than a topic name to achieve tighter
security?

Or is the topic ID fundamentally an internal detail similar to epochs used
in a bunch of other places in Kafka?

Thanks.

On Fri, Sep 11, 2020 at 4:06 PM John Roesler <vvcep...@apache.org> wrote:

> Hello Justine,
>
> Thanks for the KIP!
>
> I happen to have been confronted recently with the need to keep track of a
> large number of topics as compactly as possible. I was going to come up
> with some way to dictionary encode the topic names as integers, but this
> seems much better!
>
> Apologies if this has been raised before, but I’m wondering about the
> choice of UUID vs sequence number for the ids. Typically, I’ve seen UUIDs
> in two situations:
> 1. When processes need to generate non-colliding identifiers without
> coordination.
> 2. When the identifier needs to be “universally unique”; I.e., the
> identifier needs to distinguish the entity from all other entities that
> could ever exist. This is useful in cases where entities from all kinds of
> systems get mixed together, such as when dumping logs from all processes in
> a company into a common system.
>
> Maybe I’m being short-sighted, but it doesn’t seem like either really
> applies here. It seems like the brokers could and would achieve consensus
> when creating a topic anyway, which is all that’s required to generate
> non-colliding sequence ids. For the second, as you mention, we could assign
> a UUID to the cluster as a whole, which would render any resource scoped to
> the broker universally unique as well.
>
> The reason I mention this is that, although a UUID is way more compact
> than topic names, it’s still 16 bytes. In contrast, a 4-byte integer
> sequence id would give us 4 billion unique topics per cluster, which seems
> like enough ;)
>
> Considering the number of different times these topic identifiers are sent
> over the wire or stored in memory, it seems like it might be worth the
> additional 4x space savings.
>
> What do you think about this?
>
> Thanks,
> John
>
> On Fri, Sep 11, 2020, at 03:20, Tom Bentley wrote:
> > Hi Justine,
> >
> > This looks like a very welcome improvement. Thanks!
> >
> > Maybe I missed it, but the KIP doesn't seem to mention changing
> > DeleteTopicsRequest to identify the topic using an id. Maybe that's out
> of
> > scope, but DeleteTopicsRequest is not listed among the Future Work APIs
> > either.
> >
> > Kind regards,
> >
> > Tom
> >
> > On Thu, Sep 10, 2020 at 3:59 PM Satish Duggana <satish.dugg...@gmail.com
> >
> > wrote:
> >
> > > Thanks Lucas/Justine for the nice KIP.
> > >
> > > It has several benefits which also include simplifying the topic
> > > deletion process by controller and logs cleanup by brokers in corner
> > > cases.
> > >
> > > Best,
> > > Satish.
> > >
> > > On Wed, Sep 9, 2020 at 10:07 PM Justine Olshan <jols...@confluent.io>
> > > wrote:
> > > >
> > > > Hello all, it's been almost a year! I've made some changes to this
> KIP
> > > and hope to continue the discussion.
> > > >
> > > > One of the main changes I've added is now the metadata response will
> > > include the topic ID (as Colin suggested). Clients can obtain the
> topicID
> > > of a given topic through a TopicDescription. The topicId will also be
> > > included with the UpdateMetadata request.
> > > >
> > > > Let me know what you all think.
> > > > Thank you,
> > > > Justine
> > > >
> > > > On 2019/09/13 16:38:26, "Colin McCabe" <cmcc...@apache.org> wrote:
> > > > > Hi Lucas,
> > > > >
> > > > > Thanks for tackling this.  Topic IDs are a great idea, and this is
> a
> > > really good writeup.
> > > > >
> > > > > For /brokers/topics/[topic], the schema version should be bumped to
> > > version 3, rather than 2.  KIP-455 bumped the version of this znode to
> 2
> > > already :)
> > > > >
> > > > > Given that we're going to be seeing these things as strings as lot
> (in
> > > logs, in ZooKeeper, on the command-line, etc.), does it make sense to
> use
> > > base64 when converting them to strings?
> > > > >
> > > > > Here is an example of the hex representation:
> > > > > 6fcb514b-b878-4c9d-95b7-8dc3a7ce6fd8
> > > > >
> > > > > And here is an example in base64.
> > > > > b8tRS7h4TJ2Vt43Dp85v2A
> > > > >
> > > > > The base64 version saves 15 letters (to be fair, 4 of those were
> > > dashes that we could have elided in the hex representation.)
> > > > >
> > > > > Another thing to consider is that we should specify that the
> > > all-zeroes UUID is not a valid topic UUID.   We can't use null for this
> > > because we can't pass a null UUID over the RPC protocol (there is no
> > > special pattern for null, nor do we want to waste space reserving such
> a
> > > pattern.)
> > > > >
> > > > > Maybe I missed it, but did you describe "migration of... existing
> > > topic[s] without topic IDs" in detail in any section?  It seems like
> when
> > > the new controller becomes active, it should just generate random
> UUIDs for
> > > these, and write the random UUIDs back to ZooKeeper.  It would be good
> to
> > > spell that out.  We should make it clear that this happens regardless
> of
> > > the inter-broker protocol version (it's a compatible change).
> > > > >
> > > > > "LeaderAndIsrRequests including an is_every_partition flag" seems a
> > > bit wordy.  Can we just call these "full LeaderAndIsrRequests"?  Then
> the
> > > RPC field could be named "full".  Also, it would probably be better
> for the
> > > RPC field to be an enum of { UNSPECIFIED, INCREMENTAL, FULL }, so that
> we
> > > can cleanly handle old versions (by treating them as UNSPECIFIED)
> > > > >
> > > > > In the LeaderAndIsrRequest section, you write "A final deletion
> event
> > > will be secheduled for X ms after the LeaderAndIsrRequest was first
> > > received..."  I guess the X was a placeholder that you intended to
> replace
> > > before posting? :)  In any case, this seems like the kind of thing we'd
> > > want a configuration for.  Let's describe that configuration key
> somewhere
> > > in this KIP, including what its default value is.
> > > > >
> > > > > We should probably also log a bunch of messages at WARN level when
> > > something is scheduled for deletion, as well.  (Maybe this was
> assumed, but
> > > it would be good to mention it).
> > > > >
> > > > > I feel like there are a few sections that should be moved to
> "rejected
> > > alternatives."  For example, in the DeleteTopics section, since we're
> not
> > > going to do option 1 or 2, these should be moved into "rejected
> > > alternatives,"  rather than appearing inline.  Another case is the
> "Should
> > > we remove topic name from the protocol where possible" section.  This
> is
> > > clearly discussing a design alternative that we're not proposing to
> > > implement: removing the topic name from those protocols.
> > > > >
> > > > > Is it really necessary to have a new /admin/delete_topics_by_id
> path
> > > in ZooKeeper?  It seems like we don't really need this.  Whenever
> there is
> > > a new controller, we'll send out full LeaderAndIsrRequests which will
> > > trigger the stale topics to be cleaned up.   The active controller will
> > > also send the full LeaderAndIsrRequest to brokers that are just
> starting
> > > up.    So we don't really need this kind of two-phase commit (send out
> > > StopReplicasRequest, get ACKs from all nodes, commit by removing
> > > /admin/delete_topics node) any more.
> > > > >
> > > > > You mention that FetchRequest will now include UUID to avoid issues
> > > where requests are made to stale partitions.  However, adding a UUID to
> > > MetadataRequest is listed as future work, out of scope for this KIP.
> How
> > > will the client learn what the topic UUID is, if the metadata response
> > > doesn't include that information?  It seems like adding the UUID to
> > > MetadataResponse would be an improvement here that might not be too
> hard to
> > > make.
> > > > >
> > > > > best,
> > > > > Colin
> > > > >
> > > > >
> > > > > On Mon, Sep 9, 2019, at 17:48, Ryanne Dolan wrote:
> > > > > > Lucas, this would be great. I've run into issues with topics
> being
> > > > > > resurrected accidentally, since a client cannot easily
> distinguish
> > > between
> > > > > > a deleted topic and a new topic with the same name. I'd need the
> ID
> > > > > > accessible from the client to solve that issue, but this is a
> good
> > > first
> > > > > > step.
> > > > > >
> > > > > > Ryanne
> > > > > >
> > > > > > On Wed, Sep 4, 2019 at 1:41 PM Lucas Bradstreet <
> lu...@confluent.io>
> > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I would like to kick off discussion of KIP-516, an
> implementation
> > > of topic
> > > > > > > IDs for Kafka. Topic IDs aim to solve topic uniqueness
> problems in
> > > Kafka,
> > > > > > > where referring to a topic by name alone is insufficient. Such
> > > cases
> > > > > > > include when a topic has been deleted and recreated with the
> same
> > > name.
> > > > > > >
> > > > > > > Unique identifiers will help simplify and improve Kafka's topic
> > > deletion
> > > > > > > process, as well as prevent cases where brokers may incorrectly
> > > interact
> > > > > > > with stale versions of topics.
> > > > > > >
> > > > > > >
> > > > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers
> > > > > > >
> > > > > > > Looking forward to your thoughts.
> > > > > > >
> > > > > > > Lucas
> > > > > > >
> > > > > >
> > > > >
> > >
> > >
> >
>

Reply via email to