Hey Jay, If we allow consumer to subscribe to /*/my-event, does that mean we allow consumer to consume cross namespaces? In that case it seems not "hierarchical" but more like a name field filtering. i.e. user can choose to consume from topic where datacenter={x,y}, topic_name={my-topic1,mytopic2}. Am I understanding right?
Thanks, Jiangjie (Becket) Qin On Wed, Oct 14, 2015 at 12:49 PM, Jay Kreps <j...@confluent.io> wrote: > Hey Jason, > > I actually think this is one of the advantages. The problem we have today > is that you can't really do bidirectional replication between clusters > because it would actually be a feedback loop. > > So the intended use would be that you would have a structure where the > top-level directory was DIFFERENT but the topic names were the same, so if > you maintain > /chicago-datacenter/actual-topics > /oregon-datacenter/actual topics > etc. > Then you replicate > /chicago-datacenter/* => /oregon-datacenter > and > /oregon-datacenter/* => /chicago-datacenter > > People who want the aggregate feed subscribe to /*/my-event. > > The nice thing about this is it gives a unified namespace across all > locations. > > Basically exactly what we do now but you no longer need to add new clusters > to get the namespacing. > > -Jay > > > On Wed, Oct 14, 2015 at 11:24 AM, Jason Gustafson <ja...@confluent.io> > wrote: > > > Hey Ashish, thanks for the write-up. I think having a namespace > capability > > is a useful feature for Kafka, in particular with the addition of the > > authorization layer. I probably prefer Jay's hierarchical approach if > we're > > going to embed the namespace in the topic name since it seems more > general. > > That said, one advantage of having a namespace independent of the topic > > name is that it simplifies replication between namespaces a bit since you > > don't have to parse and rewrite topic names. Assuming that hierarchical > > topics will happen eventually anyway, I imagine a common pattern would be > > to preserve the same directory structure in multiple namespaces, so > having > > an easy mechanism for applications to switch between them would be nice. > > The namespace is kind of analogous to a chroot in this case. Of course > you > > can achieve the same thing by having a configurable topic prefix, just > you > > have to do all the topic rewriting, which I'm guessing will be a little > > annoying to implement in all of the clients and tools. However, the > > tradeoff (as you mention in the KIP) is that all request schemas have to > be > > updated, which is also annoying. > > > > -Jason > > > > On Wed, Oct 14, 2015 at 12:03 AM, Ashish Singh <asi...@cloudera.com> > > wrote: > > > > > On Mon, Oct 12, 2015 at 7:37 PM, Gwen Shapira <g...@confluent.io> > wrote: > > > > > > > This works really nicely from the consumer side, but what about the > > > > producer? If there are no more topics,do we allow producing to a > > > directory > > > > and have the Partitioner hash-partition messages between all > partitions > > > in > > > > the multiple levels in a directory? > > > > > > > Good point. > > > > > > I am personally in favor of maintaining current behavior for producer, > > > i.e., letting users to only produce to a topic. This is different for > > > consumers, the suggested behavior is inline with current behavior. One > > can > > > use regex subscription to achieve the same even today. > > > > > > > > > > > Also, I think we want to preserve the consumer terminology of > > "subscribe" > > > > to topics / directories, but "assign" partitions - since the consumer > > > > behavior is different in those cases. > > > > > > > > On Mon, Oct 12, 2015 at 7:16 PM, Jay Kreps <j...@confluent.io> wrote: > > > > > > > > > Okay this is similar to what I think we have talked about before. > Let > > > me > > > > > elaborate on the idea that I think has been floating around--it's > > > pretty > > > > > similar with a few differences. > > > > > > > > > > I think what you are calling the "default namespace" is basically > > what > > > I > > > > > would call the "current working directory" with paths not beginning > > > with > > > > > '/' being interpreted relative to this directory as in the fs. > > > > > > > > > > One thing you have to work out is what levels in this hierarchy you > > can > > > > > actually subscribe to. I think you are assuming only what we > > currently > > > > > consider a "topic", i.e. the first level of directories but not the > > > > > partitions or parent dirs, would be subscribable. If you think > about > > > it, > > > > > though, that constraint is a bit arbitrary. > > > > > > > > > > I'd propose instead the semantics that: > > > > > - Subscribing to /a/b/c/0 means subscribing to the 0th partition of > > > topic > > > > > "c" in directory /a/b > > > > > - Subscribing to /a/b/c means subscribing to all partitions in > > > > > topic/directory "c" > > > > > - Subscribing to /a/b means subscribing to all partitions in all > > > > > topics/subdirectories under a/b recursively > > > > > > > > > > Effectively the concept of topics goes away entirely--you just have > > > > > partitions/logs and directories. In this respect rather than adding > > new > > > > > concepts this new feature would actually just generalizes what we > > have > > > > > (which I think is a good thing). > > > > > > > > > > -Jay > > > > > > > > > > On Mon, Oct 12, 2015 at 6:24 PM, Ashish Singh <asi...@cloudera.com > > > > > > wrote: > > > > > > > > > > > On Mon, Oct 12, 2015 at 5:42 PM, Jay Kreps <j...@confluent.io> > > wrote: > > > > > > > > > > > > > Great. I definitely would strongly favor carrying over user's > > > > intuition > > > > > > > from FS unless we think we need a very different model. The > minor > > > > > details > > > > > > > like the seperator and namespace term will help with that. > > > > > > > > > > > > > > Follow-up question, say I have a layout like > > > > > > > /chicago-datacenter/user-events/pageviews > > > > > > > Can I subscribe to > > > > > > > /chicago-datacenter/user-events > > > > > > > > > > > > > Yes, however they will have need a regex like > > > > > > /chicago-datacenter/user-events/* > > > > > > > > > > > > > to get the full firehose of user events from chicago? Can I > > > subscribe > > > > > to > > > > > > > /*/user-events > > > > > > > to get user events originating from all datacenters? > > > > > > > > > > > > > Yes, however they will have need a regex like > > > > > > /chicago-datacenter/user-events/* > > > > > > Yes > > > > > > > > > > > > > > > > > > > > (Assuming, for now, that these are all in the same cluster...) > > > > > > > > > > > > > > Also, just to confirm, it sounds from the proposal like config > > > > > overrides > > > > > > > would become fully hierarchical so you can override config at > any > > > > > > directory > > > > > > > point. This will add complexity in implementation but I think > > will > > > > > likely > > > > > > > be much more operator friendly. > > > > > > > > > > > > > Yes, that is the idea. > > > > > > > > > > > > > > > > > > > > There are about a thousand details to discuss in terms of how > > this > > > > > would > > > > > > > impact the metadata request, various zk entries, and various > > other > > > > > > aspects, > > > > > > > but probably it makes sense to first agree on how we would want > > it > > > to > > > > > > work > > > > > > > and then start to dive into how to implement that. > > > > > > > > > > > > > Agreed. > > > > > > > > > > > > > > > > > > > > -Jay > > > > > > > > > > > > > > On Mon, Oct 12, 2015 at 5:28 PM, Ashish Singh < > > asi...@cloudera.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hey Jay, thanks for reviewing the proposal. Answers inline. > > > > > > > > > > > > > > > > On Mon, Oct 12, 2015 at 10:53 AM, Jay Kreps < > j...@confluent.io> > > > > > wrote: > > > > > > > > > > > > > > > > > Hey guys, > > > > > > > > > > > > > > > > > > I think this is an important feature and one we've talked > > about > > > > > for a > > > > > > > > > while. I really think trying to invent a new nomenclature > is > > > > going > > > > > to > > > > > > > > make > > > > > > > > > it hard for people to understand, though. As such I > recommend > > > we > > > > > call > > > > > > > > > namespaces "directories" and denote them with '/'--this > will > > > make > > > > > the > > > > > > > > > feature 1000x more understandable to people. > > > > > > > > > > > > > > > > Essentially you are suggesting two things here. > > > > > > > > 1. Use "Directory" instead of "Namespace" as it is more > > > intuitive. > > > > I > > > > > > > agree. > > > > > > > > 2. Make '/' as delimiter instead of ':'. Fine with me and I > > agree > > > > if > > > > > we > > > > > > > > call these directories, '/' is the way to go. > > > > > > > > > > > > > > > > I think we should inheret the > > > > > > > > > semantics of normal unix fs in so far as it makes sense. > > > > > > > > > > > > > > > > > > In this approach we get rid of topics entirely, instead we > > > really > > > > > > just > > > > > > > > have > > > > > > > > > partitions which are the equivalent of a file and retain > > their > > > > > > numeric > > > > > > > > > names, and the existing topic concept is just the first > > > directory > > > > > > level > > > > > > > > but > > > > > > > > > we generalize to allow arbitrarily many more levels of > > nesting. > > > > > This > > > > > > > > allows > > > > > > > > > categorization of data, such as > > > > > /datacenter1/user-events/page-views/3 > > > > > > > and > > > > > > > > > you can subscribe, apply configs or permissions at any > level > > of > > > > the > > > > > > > > > hierarchy. > > > > > > > > > > > > > > > > > +1. This actually requires just a minor change to existing > > > > proposal, > > > > > > > i.e., > > > > > > > > "some:namespace:topic" becomes "some/namespace/topic". > > > > > > > > > > > > > > > > > > > > > > > > > > I'm actually not 100% such what the semantics of accessing > > data > > > > in > > > > > > > > > differing namespaces is in the current proposal, maybe you > > can > > > > > > clarify > > > > > > > > > Ashish? > > > > > > > > > > > > > > > > I will add more info to KIP on this, however I think a client > > > > should > > > > > be > > > > > > > > able to access data in any namespace as long as following > > > > conditions > > > > > > are > > > > > > > > satisfied. > > > > > > > > > > > > > > > > 1. Namespace, the client is trying to access, exists. > > > > > > > > 2. The client has sufficient permissions on the namespace for > > > type > > > > of > > > > > > > > operation the client is trying to perform on a topic within > > that > > > > > > > namespace. > > > > > > > > 3. The client has sufficient permissions on the topic for > type > > of > > > > > > > operation > > > > > > > > the client is trying to perform on that topic. > > > > > > > > > > > > > > > > If we choose to go with what you suggested earlier that just > > have > > > > > > > hierarchy > > > > > > > > of directories, then step 3 will actually be covered in step > 2. > > > > > > > > > > > > > > > > In the current proposal, consumers will subscribe to a topic > > in a > > > > > > > namespace > > > > > > > > by specifying <namespace>:<topic> as the topic name. They can > > > > > subscribe > > > > > > > to > > > > > > > > topics from multiple namespaces. > > > > > > > > > > > > > > > > Let me know if I totally missed your question. > > > > > > > > > > > > > > > > Since the point of Kafka is sharing data I think it is really > > > > > > > > > important that the grouping be just for > > > > > > > > convenience/permissions/config/etc > > > > > > > > > and that it remain possible to access multiple > > > > > directories/namespaces > > > > > > > > from > > > > > > > > > the same client. > > > > > > > > > > > > > > > > > Totally agree with you. > > > > > > > > > > > > > > > > > > > > > > > > > > -Jay > > > > > > > > > > > > > > > > > > On Fri, Oct 9, 2015 at 6:32 PM, Ashish Singh < > > > > asi...@cloudera.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hey Guys, > > > > > > > > > > > > > > > > > > > > I just created KIP-37 for adding namespaces to Kafka. > > > > > > > > > > > > > > > > > > > > KIP-37 > > > > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-37+-+Add+Namespaces+to+Kafka > > > > > > > > > > > > > > > > > > > > > tracks the proposal. > > > > > > > > > > > > > > > > > > > > The idea is to make Kafka support multi-tenancy via > > > namespaces. > > > > > > > > > > > > > > > > > > > > Feedback and comments are welcome. > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > Ashish > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > Regards, > > > > > > > > Ashish > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > Regards, > > > > > > Ashish > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Regards, > > > Ashish > > > > > >