Hey Jay,

If we allow consumer to subscribe to /*/my-event, does that mean we allow
consumer to consume cross namespaces? In that case it seems not
"hierarchical" but more like a name field filtering. i.e. user can choose
to consume from topic where datacenter={x,y},
topic_name={my-topic1,mytopic2}. Am I understanding right?

Thanks,

Jiangjie (Becket) Qin

On Wed, Oct 14, 2015 at 12:49 PM, Jay Kreps <j...@confluent.io> wrote:

> Hey Jason,
>
> I actually think this is one of the advantages. The problem we have today
> is that you can't really do bidirectional replication between clusters
> because it would actually be a feedback loop.
>
> So the intended use would be that you would have a structure where the
> top-level directory was DIFFERENT but the topic names were the same, so if
> you maintain
>   /chicago-datacenter/actual-topics
>   /oregon-datacenter/actual topics
>   etc.
> Then you replicate
>   /chicago-datacenter/* => /oregon-datacenter
> and
>   /oregon-datacenter/* => /chicago-datacenter
>
> People who want the aggregate feed subscribe to /*/my-event.
>
> The nice thing about this is it gives a unified namespace across all
> locations.
>
> Basically exactly what we do now but you no longer need to add new clusters
> to get the namespacing.
>
> -Jay
>
>
> On Wed, Oct 14, 2015 at 11:24 AM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hey Ashish, thanks for the write-up. I think having a namespace
> capability
> > is a useful feature for Kafka, in particular with the addition of the
> > authorization layer. I probably prefer Jay's hierarchical approach if
> we're
> > going to embed the namespace in the topic name since it seems more
> general.
> > That said, one advantage of having a namespace independent of the topic
> > name is that it simplifies replication between namespaces a bit since you
> > don't have to parse and rewrite topic names. Assuming that hierarchical
> > topics will happen eventually anyway, I imagine a common pattern would be
> > to preserve the same directory structure in multiple namespaces, so
> having
> > an easy mechanism for applications to switch between them would be nice.
> > The namespace is kind of analogous to a chroot in this case. Of course
> you
> > can achieve the same thing by having a configurable topic prefix, just
> you
> > have to do all the topic rewriting, which I'm guessing will be a little
> > annoying to implement in all of the clients and tools. However, the
> > tradeoff (as you mention in the KIP) is that all request schemas have to
> be
> > updated, which is also annoying.
> >
> > -Jason
> >
> > On Wed, Oct 14, 2015 at 12:03 AM, Ashish Singh <asi...@cloudera.com>
> > wrote:
> >
> > > On Mon, Oct 12, 2015 at 7:37 PM, Gwen Shapira <g...@confluent.io>
> wrote:
> > >
> > > > This works really nicely from the consumer side, but what about the
> > > > producer? If there are no more topics,do we allow producing to a
> > > directory
> > > > and have the Partitioner hash-partition messages between all
> partitions
> > > in
> > > > the multiple levels in a directory?
> > > >
> > > Good point.
> > >
> > > I am personally in favor of maintaining current behavior for producer,
> > > i.e., letting users to only produce to a topic. This is different for
> > > consumers, the suggested behavior is inline with current behavior. One
> > can
> > > use regex subscription to achieve the same even today.
> > >
> > > >
> > > > Also, I think we want to preserve the consumer terminology of
> > "subscribe"
> > > > to topics / directories, but "assign" partitions - since the consumer
> > > > behavior is different in those cases.
> > > >
> > > > On Mon, Oct 12, 2015 at 7:16 PM, Jay Kreps <j...@confluent.io> wrote:
> > > >
> > > > > Okay this is similar to what I think we have talked about before.
> Let
> > > me
> > > > > elaborate on the idea that I think has been floating around--it's
> > > pretty
> > > > > similar with a few differences.
> > > > >
> > > > > I think what you are calling the "default namespace" is basically
> > what
> > > I
> > > > > would call the "current working directory" with paths not beginning
> > > with
> > > > > '/' being interpreted relative to this directory as in the fs.
> > > > >
> > > > > One thing you have to work out is what levels in this hierarchy you
> > can
> > > > > actually subscribe to. I think you are assuming only what we
> > currently
> > > > > consider a "topic", i.e. the first level of directories but not the
> > > > > partitions or parent dirs, would be subscribable. If you think
> about
> > > it,
> > > > > though, that constraint is a bit arbitrary.
> > > > >
> > > > > I'd propose instead the semantics that:
> > > > > - Subscribing to /a/b/c/0 means subscribing to the 0th partition of
> > > topic
> > > > > "c" in directory /a/b
> > > > > - Subscribing to /a/b/c means subscribing to all partitions in
> > > > > topic/directory "c"
> > > > > - Subscribing to /a/b means subscribing to all partitions in all
> > > > > topics/subdirectories under a/b recursively
> > > > >
> > > > > Effectively the concept of topics goes away entirely--you just have
> > > > > partitions/logs and directories. In this respect rather than adding
> > new
> > > > > concepts this new feature would actually just generalizes what we
> > have
> > > > > (which I think is a good thing).
> > > > >
> > > > > -Jay
> > > > >
> > > > > On Mon, Oct 12, 2015 at 6:24 PM, Ashish Singh <asi...@cloudera.com
> >
> > > > wrote:
> > > > >
> > > > > > On Mon, Oct 12, 2015 at 5:42 PM, Jay Kreps <j...@confluent.io>
> > wrote:
> > > > > >
> > > > > > > Great. I definitely would strongly favor carrying over user's
> > > > intuition
> > > > > > > from FS unless we think we need a very different model. The
> minor
> > > > > details
> > > > > > > like the seperator and namespace term will help with that.
> > > > > > >
> > > > > > > Follow-up question, say I have a layout like
> > > > > > >    /chicago-datacenter/user-events/pageviews
> > > > > > > Can I subscribe to
> > > > > > >    /chicago-datacenter/user-events
> > > > > > >
> > > > > > Yes, however they will have need a regex like
> > > > > > /chicago-datacenter/user-events/*
> > > > > >
> > > > > > > to get the full firehose of user events from chicago? Can I
> > > subscribe
> > > > > to
> > > > > > >    /*/user-events
> > > > > > > to get user events originating from all datacenters?
> > > > > > >
> > > > > > Yes, however they will have need a regex like
> > > > > > /chicago-datacenter/user-events/*
> > > > > > Yes
> > > > > >
> > > > > > >
> > > > > > > (Assuming, for now, that these are all in the same cluster...)
> > > > > > >
> > > > > > > Also, just to confirm, it sounds from the proposal like config
> > > > > overrides
> > > > > > > would become fully hierarchical so you can override config at
> any
> > > > > > directory
> > > > > > > point. This will add complexity in implementation but I think
> > will
> > > > > likely
> > > > > > > be much more operator friendly.
> > > > > > >
> > > > > > Yes, that is the idea.
> > > > > >
> > > > > > >
> > > > > > > There are about a thousand details to discuss in terms of how
> > this
> > > > > would
> > > > > > > impact the metadata request, various zk entries, and various
> > other
> > > > > > aspects,
> > > > > > > but probably it makes sense to first agree on how we would want
> > it
> > > to
> > > > > > work
> > > > > > > and then start to dive into how to implement that.
> > > > > > >
> > > > > > Agreed.
> > > > > >
> > > > > > >
> > > > > > > -Jay
> > > > > > >
> > > > > > > On Mon, Oct 12, 2015 at 5:28 PM, Ashish Singh <
> > asi...@cloudera.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hey Jay, thanks for reviewing the proposal. Answers inline.
> > > > > > > >
> > > > > > > > On Mon, Oct 12, 2015 at 10:53 AM, Jay Kreps <
> j...@confluent.io>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hey guys,
> > > > > > > > >
> > > > > > > > > I think this is an important feature and one we've talked
> > about
> > > > > for a
> > > > > > > > > while. I really think trying to invent a new nomenclature
> is
> > > > going
> > > > > to
> > > > > > > > make
> > > > > > > > > it hard for people to understand, though. As such I
> recommend
> > > we
> > > > > call
> > > > > > > > > namespaces "directories" and denote them with '/'--this
> will
> > > make
> > > > > the
> > > > > > > > > feature 1000x more understandable to people.
> > > > > > > >
> > > > > > > > Essentially you are suggesting two things here.
> > > > > > > > 1. Use "Directory" instead of "Namespace" as it is more
> > > intuitive.
> > > > I
> > > > > > > agree.
> > > > > > > > 2. Make '/' as delimiter instead of ':'. Fine with me and I
> > agree
> > > > if
> > > > > we
> > > > > > > > call these directories, '/' is the way to go.
> > > > > > > >
> > > > > > > > I think we should inheret the
> > > > > > > > > semantics of normal unix fs in so far as it makes sense.
> > > > > > > > >
> > > > > > > > > In this approach we get rid of topics entirely, instead we
> > > really
> > > > > > just
> > > > > > > > have
> > > > > > > > > partitions which are the equivalent of a file and retain
> > their
> > > > > > numeric
> > > > > > > > > names, and the existing topic concept is just the first
> > > directory
> > > > > > level
> > > > > > > > but
> > > > > > > > > we generalize to allow arbitrarily many more levels of
> > nesting.
> > > > > This
> > > > > > > > allows
> > > > > > > > > categorization of data, such as
> > > > > /datacenter1/user-events/page-views/3
> > > > > > > and
> > > > > > > > > you can subscribe, apply configs or permissions at any
> level
> > of
> > > > the
> > > > > > > > > hierarchy.
> > > > > > > > >
> > > > > > > > +1. This actually requires just a minor change to existing
> > > > proposal,
> > > > > > > i.e.,
> > > > > > > > "some:namespace:topic" becomes "some/namespace/topic".
> > > > > > > >
> > > > > > > > >
> > > > > > > > > I'm actually not 100% such what the semantics of accessing
> > data
> > > > in
> > > > > > > > > differing namespaces is in the current proposal, maybe you
> > can
> > > > > > clarify
> > > > > > > > > Ashish?
> > > > > > > >
> > > > > > > > I will add more info to KIP on this, however I think a client
> > > > should
> > > > > be
> > > > > > > > able to access data in any namespace as long as following
> > > > conditions
> > > > > > are
> > > > > > > > satisfied.
> > > > > > > >
> > > > > > > > 1. Namespace, the client is trying to access, exists.
> > > > > > > > 2. The client has sufficient permissions on the namespace for
> > > type
> > > > of
> > > > > > > > operation the client is trying to perform on a topic within
> > that
> > > > > > > namespace.
> > > > > > > > 3. The client has sufficient permissions on the topic for
> type
> > of
> > > > > > > operation
> > > > > > > > the client is trying to perform on that topic.
> > > > > > > >
> > > > > > > > If we choose to go with what you suggested earlier that just
> > have
> > > > > > > hierarchy
> > > > > > > > of directories, then step 3 will actually be covered in step
> 2.
> > > > > > > >
> > > > > > > > In the current proposal, consumers will subscribe to a topic
> > in a
> > > > > > > namespace
> > > > > > > > by specifying <namespace>:<topic> as the topic name. They can
> > > > > subscribe
> > > > > > > to
> > > > > > > > topics from multiple namespaces.
> > > > > > > >
> > > > > > > > Let me know if I totally missed your question.
> > > > > > > >
> > > > > > > > Since the point of Kafka is sharing data I think it is really
> > > > > > > > > important that the grouping be just for
> > > > > > > > convenience/permissions/config/etc
> > > > > > > > > and that it remain possible to access multiple
> > > > > directories/namespaces
> > > > > > > > from
> > > > > > > > > the same client.
> > > > > > > > >
> > > > > > > > Totally agree with you.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > -Jay
> > > > > > > > >
> > > > > > > > > On Fri, Oct 9, 2015 at 6:32 PM, Ashish Singh <
> > > > asi...@cloudera.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hey Guys,
> > > > > > > > > >
> > > > > > > > > > I just created KIP-37 for adding namespaces to Kafka.
> > > > > > > > > >
> > > > > > > > > > KIP-37
> > > > > > > > > > <
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-37+-+Add+Namespaces+to+Kafka
> > > > > > > > > > >
> > > > > > > > > > tracks the proposal.
> > > > > > > > > >
> > > > > > > > > > The idea is to make Kafka support multi-tenancy via
> > > namespaces.
> > > > > > > > > >
> > > > > > > > > > Feedback and comments are welcome.
> > > > > > > > > > ​
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Ashish
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Ashish
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Regards,
> > > > > > Ashish
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Regards,
> > > Ashish
> > >
> >
>

Reply via email to