Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

Jay Kreps Wed, 21 Oct 2015 14:23:28 -0700

Gwen, It's a good question of what the producer semantics are--would we
only allow you to produce to a partition or first level directory or would
we hash over whatever subtree you supply? Actually not sure which makes
more sense...


Ashish, here are some thoughts:
1. I think we can do this online. There is a question of what happens to
readers and writers but presumably it would the same thing as if that topic
weren't there. There would be no guarantee this would happen atomic over
different brokers or clients, though.
2. ACLs should work like unix perms, right? I think configs would overide
hierarchically, so we would have a full set of configs for each partition
computed by walking up the tree from the root and taking the first
override). I think this is what you're describing, right?
3. Totally agree no reason to have an arbitrary limit.
4. I actually don't think the physical layout on disk should be at all
connected to the logical directory hierarchy we present. That is, whether
you use RAID or not shouldn't impact the location of a topic in your
directory structure. Not sure if this is what you are saying or not. This
does raise the question of how to do the disk layout. The simplest thing
would be to keep the flat data directories but make the names of the
partitions on disk just be logical inode numbers and then have a separate
mapping of these inodes to logical names stored in ZK with a cache. I think
this would make things like rename fast and atomic. The downside of this is
that the 'ls' command will no longer tell you much about the data on a
broker.

-Jay

On Wed, Oct 21, 2015 at 12:43 PM, Ashish Singh <[email protected]> wrote:

> In last KIP hangout following questions were raised.
>
>    1.
>
>    *Whether or not to support move command? If yes, how do we support it.*
>    I think *move* command will be essential, once we start supporting
>    directories. However, implementation might be a bit convoluted. A few
>    things required for it will be, ability to mark a topic unavailable
> during
>    the move, update brokers’ metadata cache to reflect the move.
>    2.
>
>    *How will acls/ configs inheritance work?*
>    Say we have /dc/ns/topic.
>    dc has dc_acl and dc_config. Similarly for ns and topic.
>    For being able to perform an action on /dc/ns/topic, the user must have
>    required perms on dc, ns and topic for that operation. For example,
> User1
>    will need DESCRIBE permissions on dc, ns and topic to be able to
> describe
>    /dc/ns/topic.
>    For configs, configs for /dc/ns/topic will be topic_config + ns_config +
>    dc_config, in that order. So, if a config is specified for topic then
> that
>    will be used, else it’s parent (ns) will be checked for that config, and
>    this goes on.
>    3.
>
>    *Will supporting n-deep hierarchy be a concern?*
>    This can be a performance concern, however it sounds more of a misusage
>    of the functionality or bad organization of topics. We can have a depth
>    limit, but I am not sure if it is required.
>    4.
>
>    *Will we continue to support multi-directory on disk, that was proposed
>    in KAFKA-188?*
>    Yes, we should be able to support that. It is within those directories,
>    namespaces will be created. The heuristics for choosing least loaded
>    disc/dir will remain same.
>    5.
>
>    *Will it be required to move existing topics from default directory/
>    namespace to a particular directory/ namespace to enable mirror-maker
>    replicate topics in that directory/namespace?*
>    I do not think it will be required, as one can simple add /*/* to
>    mirror-maker’s blacklist and this will only capture topics that exist in
>    default namespace. @Joel, does this answer your question?
>
> 
>
> On Fri, Oct 16, 2015 at 6:33 PM, Ashish Singh <[email protected]> wrote:
>
> > On Thu, Oct 15, 2015 at 1:30 PM, Jiangjie Qin <[email protected]
> >
> > wrote:
> >
> >> Hey Jay,
> >>
> >> If we allow consumer to subscribe to /*/my-event, does that mean we
> allow
> >> consumer to consume cross namespaces?
> >
> > That is the idea. If a user has permissions then yes, he should be able
> to
> > consume from as many namespaces as he wants.
> >
> >
> >> In that case it seems not
> >> "hierarchical" but more like a name field filtering. i.e. user can
> choose
> >> to consume from topic where datacenter={x,y},
> >> topic_name={my-topic1,mytopic2}. Am I understanding right?
> >>
> > I think it is still hierarchical, however with possible filtering (as you
> > said).
> >
> >>
> >> Thanks,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> On Wed, Oct 14, 2015 at 12:49 PM, Jay Kreps <[email protected]> wrote:
> >>
> >> > Hey Jason,
> >> >
> >> > I actually think this is one of the advantages. The problem we have
> >> today
> >> > is that you can't really do bidirectional replication between clusters
> >> > because it would actually be a feedback loop.
> >> >
> >> > So the intended use would be that you would have a structure where the
> >> > top-level directory was DIFFERENT but the topic names were the same,
> so
> >> if
> >> > you maintain
> >> >   /chicago-datacenter/actual-topics
> >> >   /oregon-datacenter/actual topics
> >> >   etc.
> >> > Then you replicate
> >> >   /chicago-datacenter/* => /oregon-datacenter
> >> > and
> >> >   /oregon-datacenter/* => /chicago-datacenter
> >> >
> >> > People who want the aggregate feed subscribe to /*/my-event.
> >> >
> >> > The nice thing about this is it gives a unified namespace across all
> >> > locations.
> >> >
> >> > Basically exactly what we do now but you no longer need to add new
> >> clusters
> >> > to get the namespacing.
> >> >
> >> > -Jay
> >> >
> >> >
> >> > On Wed, Oct 14, 2015 at 11:24 AM, Jason Gustafson <[email protected]
> >
> >> > wrote:
> >> >
> >> > > Hey Ashish, thanks for the write-up. I think having a namespace
> >> > capability
> >> > > is a useful feature for Kafka, in particular with the addition of
> the
> >> > > authorization layer. I probably prefer Jay's hierarchical approach
> if
> >> > we're
> >> > > going to embed the namespace in the topic name since it seems more
> >> > general.
> >> > > That said, one advantage of having a namespace independent of the
> >> topic
> >> > > name is that it simplifies replication between namespaces a bit
> since
> >> you
> >> > > don't have to parse and rewrite topic names. Assuming that
> >> hierarchical
> >> > > topics will happen eventually anyway, I imagine a common pattern
> >> would be
> >> > > to preserve the same directory structure in multiple namespaces, so
> >> > having
> >> > > an easy mechanism for applications to switch between them would be
> >> nice.
> >> > > The namespace is kind of analogous to a chroot in this case. Of
> course
> >> > you
> >> > > can achieve the same thing by having a configurable topic prefix,
> just
> >> > you
> >> > > have to do all the topic rewriting, which I'm guessing will be a
> >> little
> >> > > annoying to implement in all of the clients and tools. However, the
> >> > > tradeoff (as you mention in the KIP) is that all request schemas
> have
> >> to
> >> > be
> >> > > updated, which is also annoying.
> >> > >
> >> > > -Jason
> >> > >
> >> > > On Wed, Oct 14, 2015 at 12:03 AM, Ashish Singh <[email protected]
> >
> >> > > wrote:
> >> > >
> >> > > > On Mon, Oct 12, 2015 at 7:37 PM, Gwen Shapira <[email protected]>
> >> > wrote:
> >> > > >
> >> > > > > This works really nicely from the consumer side, but what about
> >> the
> >> > > > > producer? If there are no more topics,do we allow producing to a
> >> > > > directory
> >> > > > > and have the Partitioner hash-partition messages between all
> >> > partitions
> >> > > > in
> >> > > > > the multiple levels in a directory?
> >> > > > >
> >> > > > Good point.
> >> > > >
> >> > > > I am personally in favor of maintaining current behavior for
> >> producer,
> >> > > > i.e., letting users to only produce to a topic. This is different
> >> for
> >> > > > consumers, the suggested behavior is inline with current behavior.
> >> One
> >> > > can
> >> > > > use regex subscription to achieve the same even today.
> >> > > >
> >> > > > >
> >> > > > > Also, I think we want to preserve the consumer terminology of
> >> > > "subscribe"
> >> > > > > to topics / directories, but "assign" partitions - since the
> >> consumer
> >> > > > > behavior is different in those cases.
> >> > > > >
> >> > > > > On Mon, Oct 12, 2015 at 7:16 PM, Jay Kreps <[email protected]>
> >> wrote:
> >> > > > >
> >> > > > > > Okay this is similar to what I think we have talked about
> >> before.
> >> > Let
> >> > > > me
> >> > > > > > elaborate on the idea that I think has been floating
> >> around--it's
> >> > > > pretty
> >> > > > > > similar with a few differences.
> >> > > > > >
> >> > > > > > I think what you are calling the "default namespace" is
> >> basically
> >> > > what
> >> > > > I
> >> > > > > > would call the "current working directory" with paths not
> >> beginning
> >> > > > with
> >> > > > > > '/' being interpreted relative to this directory as in the fs.
> >> > > > > >
> >> > > > > > One thing you have to work out is what levels in this
> hierarchy
> >> you
> >> > > can
> >> > > > > > actually subscribe to. I think you are assuming only what we
> >> > > currently
> >> > > > > > consider a "topic", i.e. the first level of directories but
> not
> >> the
> >> > > > > > partitions or parent dirs, would be subscribable. If you think
> >> > about
> >> > > > it,
> >> > > > > > though, that constraint is a bit arbitrary.
> >> > > > > >
> >> > > > > > I'd propose instead the semantics that:
> >> > > > > > - Subscribing to /a/b/c/0 means subscribing to the 0th
> >> partition of
> >> > > > topic
> >> > > > > > "c" in directory /a/b
> >> > > > > > - Subscribing to /a/b/c means subscribing to all partitions in
> >> > > > > > topic/directory "c"
> >> > > > > > - Subscribing to /a/b means subscribing to all partitions in
> all
> >> > > > > > topics/subdirectories under a/b recursively
> >> > > > > >
> >> > > > > > Effectively the concept of topics goes away entirely--you just
> >> have
> >> > > > > > partitions/logs and directories. In this respect rather than
> >> adding
> >> > > new
> >> > > > > > concepts this new feature would actually just generalizes what
> >> we
> >> > > have
> >> > > > > > (which I think is a good thing).
> >> > > > > >
> >> > > > > > -Jay
> >> > > > > >
> >> > > > > > On Mon, Oct 12, 2015 at 6:24 PM, Ashish Singh <
> >> [email protected]
> >> > >
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > > On Mon, Oct 12, 2015 at 5:42 PM, Jay Kreps <
> [email protected]>
> >> > > wrote:
> >> > > > > > >
> >> > > > > > > > Great. I definitely would strongly favor carrying over
> >> user's
> >> > > > > intuition
> >> > > > > > > > from FS unless we think we need a very different model.
> The
> >> > minor
> >> > > > > > details
> >> > > > > > > > like the seperator and namespace term will help with that.
> >> > > > > > > >
> >> > > > > > > > Follow-up question, say I have a layout like
> >> > > > > > > >    /chicago-datacenter/user-events/pageviews
> >> > > > > > > > Can I subscribe to
> >> > > > > > > >    /chicago-datacenter/user-events
> >> > > > > > > >
> >> > > > > > > Yes, however they will have need a regex like
> >> > > > > > > /chicago-datacenter/user-events/*
> >> > > > > > >
> >> > > > > > > > to get the full firehose of user events from chicago? Can
> I
> >> > > > subscribe
> >> > > > > > to
> >> > > > > > > >    /*/user-events
> >> > > > > > > > to get user events originating from all datacenters?
> >> > > > > > > >
> >> > > > > > > Yes, however they will have need a regex like
> >> > > > > > > /chicago-datacenter/user-events/*
> >> > > > > > > Yes
> >> > > > > > >
> >> > > > > > > >
> >> > > > > > > > (Assuming, for now, that these are all in the same
> >> cluster...)
> >> > > > > > > >
> >> > > > > > > > Also, just to confirm, it sounds from the proposal like
> >> config
> >> > > > > > overrides
> >> > > > > > > > would become fully hierarchical so you can override config
> >> at
> >> > any
> >> > > > > > > directory
> >> > > > > > > > point. This will add complexity in implementation but I
> >> think
> >> > > will
> >> > > > > > likely
> >> > > > > > > > be much more operator friendly.
> >> > > > > > > >
> >> > > > > > > Yes, that is the idea.
> >> > > > > > >
> >> > > > > > > >
> >> > > > > > > > There are about a thousand details to discuss in terms of
> >> how
> >> > > this
> >> > > > > > would
> >> > > > > > > > impact the metadata request, various zk entries, and
> various
> >> > > other
> >> > > > > > > aspects,
> >> > > > > > > > but probably it makes sense to first agree on how we would
> >> want
> >> > > it
> >> > > > to
> >> > > > > > > work
> >> > > > > > > > and then start to dive into how to implement that.
> >> > > > > > > >
> >> > > > > > > Agreed.
> >> > > > > > >
> >> > > > > > > >
> >> > > > > > > > -Jay
> >> > > > > > > >
> >> > > > > > > > On Mon, Oct 12, 2015 at 5:28 PM, Ashish Singh <
> >> > > [email protected]
> >> > > > >
> >> > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > Hey Jay, thanks for reviewing the proposal. Answers
> >> inline.
> >> > > > > > > > >
> >> > > > > > > > > On Mon, Oct 12, 2015 at 10:53 AM, Jay Kreps <
> >> > [email protected]>
> >> > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Hey guys,
> >> > > > > > > > > >
> >> > > > > > > > > > I think this is an important feature and one we've
> >> talked
> >> > > about
> >> > > > > > for a
> >> > > > > > > > > > while. I really think trying to invent a new
> >> nomenclature
> >> > is
> >> > > > > going
> >> > > > > > to
> >> > > > > > > > > make
> >> > > > > > > > > > it hard for people to understand, though. As such I
> >> > recommend
> >> > > > we
> >> > > > > > call
> >> > > > > > > > > > namespaces "directories" and denote them with
> '/'--this
> >> > will
> >> > > > make
> >> > > > > > the
> >> > > > > > > > > > feature 1000x more understandable to people.
> >> > > > > > > > >
> >> > > > > > > > > Essentially you are suggesting two things here.
> >> > > > > > > > > 1. Use "Directory" instead of "Namespace" as it is more
> >> > > > intuitive.
> >> > > > > I
> >> > > > > > > > agree.
> >> > > > > > > > > 2. Make '/' as delimiter instead of ':'. Fine with me
> and
> >> I
> >> > > agree
> >> > > > > if
> >> > > > > > we
> >> > > > > > > > > call these directories, '/' is the way to go.
> >> > > > > > > > >
> >> > > > > > > > > I think we should inheret the
> >> > > > > > > > > > semantics of normal unix fs in so far as it makes
> sense.
> >> > > > > > > > > >
> >> > > > > > > > > > In this approach we get rid of topics entirely,
> instead
> >> we
> >> > > > really
> >> > > > > > > just
> >> > > > > > > > > have
> >> > > > > > > > > > partitions which are the equivalent of a file and
> retain
> >> > > their
> >> > > > > > > numeric
> >> > > > > > > > > > names, and the existing topic concept is just the
> first
> >> > > > directory
> >> > > > > > > level
> >> > > > > > > > > but
> >> > > > > > > > > > we generalize to allow arbitrarily many more levels of
> >> > > nesting.
> >> > > > > > This
> >> > > > > > > > > allows
> >> > > > > > > > > > categorization of data, such as
> >> > > > > > /datacenter1/user-events/page-views/3
> >> > > > > > > > and
> >> > > > > > > > > > you can subscribe, apply configs or permissions at any
> >> > level
> >> > > of
> >> > > > > the
> >> > > > > > > > > > hierarchy.
> >> > > > > > > > > >
> >> > > > > > > > > +1. This actually requires just a minor change to
> existing
> >> > > > > proposal,
> >> > > > > > > > i.e.,
> >> > > > > > > > > "some:namespace:topic" becomes "some/namespace/topic".
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > I'm actually not 100% such what the semantics of
> >> accessing
> >> > > data
> >> > > > > in
> >> > > > > > > > > > differing namespaces is in the current proposal, maybe
> >> you
> >> > > can
> >> > > > > > > clarify
> >> > > > > > > > > > Ashish?
> >> > > > > > > > >
> >> > > > > > > > > I will add more info to KIP on this, however I think a
> >> client
> >> > > > > should
> >> > > > > > be
> >> > > > > > > > > able to access data in any namespace as long as
> following
> >> > > > > conditions
> >> > > > > > > are
> >> > > > > > > > > satisfied.
> >> > > > > > > > >
> >> > > > > > > > > 1. Namespace, the client is trying to access, exists.
> >> > > > > > > > > 2. The client has sufficient permissions on the
> namespace
> >> for
> >> > > > type
> >> > > > > of
> >> > > > > > > > > operation the client is trying to perform on a topic
> >> within
> >> > > that
> >> > > > > > > > namespace.
> >> > > > > > > > > 3. The client has sufficient permissions on the topic
> for
> >> > type
> >> > > of
> >> > > > > > > > operation
> >> > > > > > > > > the client is trying to perform on that topic.
> >> > > > > > > > >
> >> > > > > > > > > If we choose to go with what you suggested earlier that
> >> just
> >> > > have
> >> > > > > > > > hierarchy
> >> > > > > > > > > of directories, then step 3 will actually be covered in
> >> step
> >> > 2.
> >> > > > > > > > >
> >> > > > > > > > > In the current proposal, consumers will subscribe to a
> >> topic
> >> > > in a
> >> > > > > > > > namespace
> >> > > > > > > > > by specifying <namespace>:<topic> as the topic name.
> They
> >> can
> >> > > > > > subscribe
> >> > > > > > > > to
> >> > > > > > > > > topics from multiple namespaces.
> >> > > > > > > > >
> >> > > > > > > > > Let me know if I totally missed your question.
> >> > > > > > > > >
> >> > > > > > > > > Since the point of Kafka is sharing data I think it is
> >> really
> >> > > > > > > > > > important that the grouping be just for
> >> > > > > > > > > convenience/permissions/config/etc
> >> > > > > > > > > > and that it remain possible to access multiple
> >> > > > > > directories/namespaces
> >> > > > > > > > > from
> >> > > > > > > > > > the same client.
> >> > > > > > > > > >
> >> > > > > > > > > Totally agree with you.
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > -Jay
> >> > > > > > > > > >
> >> > > > > > > > > > On Fri, Oct 9, 2015 at 6:32 PM, Ashish Singh <
> >> > > > > [email protected]>
> >> > > > > > > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > > Hey Guys,
> >> > > > > > > > > > >
> >> > > > > > > > > > > I just created KIP-37 for adding namespaces to
> Kafka.
> >> > > > > > > > > > >
> >> > > > > > > > > > > KIP-37
> >> > > > > > > > > > > <
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-37+-+Add+Namespaces+to+Kafka
> >> > > > > > > > > > > >
> >> > > > > > > > > > > tracks the proposal.
> >> > > > > > > > > > >
> >> > > > > > > > > > > The idea is to make Kafka support multi-tenancy via
> >> > > > namespaces.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Feedback and comments are welcome.
> >> > > > > > > > > > > 
> >> > > > > > > > > > > --
> >> > > > > > > > > > >
> >> > > > > > > > > > > Regards,
> >> > > > > > > > > > > Ashish
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > --
> >> > > > > > > > >
> >> > > > > > > > > Regards,
> >> > > > > > > > > Ashish
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > --
> >> > > > > > >
> >> > > > > > > Regards,
> >> > > > > > > Ashish
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > >
> >> > > > Regards,
> >> > > > Ashish
> >> > > >
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> >
> > Regards,
> > Ashish
> >
>
>
>
> --
>
> Regards,
> Ashish
>

Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

Reply via email to