Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-23 Thread Gwen Shapira
What about topic-level metrics? Are we going to report metrics at all
level now? Or maybe just at partition-level and use the monitoring app
to aggregate them in different levels (i.e. remove topic metrics
completely)?




On Wed, Oct 21, 2015 at 3:47 PM, Ashish Singh  wrote:
> On Wed, Oct 21, 2015 at 2:22 PM, Jay Kreps  wrote:
>
>> Gwen, It's a good question of what the producer semantics are--would we
>> only allow you to produce to a partition or first level directory or would
>> we hash over whatever subtree you supply? Actually not sure which makes
>> more sense...
>>
>> Ashish, here are some thoughts:
>> 1. I think we can do this online. There is a question of what happens to
>> readers and writers but presumably it would the same thing as if that topic
>> weren't there. There would be no guarantee this would happen atomic over
>> different brokers or clients, though.
>> 2. ACLs should work like unix perms, right?
>
>
> Are you suggesting we should move allowed operations to R, W, X model of
> unix. Currently, we support these operations
> 
> .
>
> I think configs would overide
>> hierarchically, so we would have a full set of configs for each partition
>> computed by walking up the tree from the root and taking the first
>> override). I think this is what you're describing, right?
>>
>
> Yes.
>
> 3. Totally agree no reason to have an arbitrary limit.
>> 4. I actually don't think the physical layout on disk should be at all
>> connected to the logical directory hierarchy we present.
>
>
> I think it will be useful to have that connection as that will enable users
> to encrypt different namespaces with different keys. Thus, one more step
> towards a completely multi tenant system.
>
>
>> That is, whether
>> you use RAID or not shouldn't impact the location of a topic in your
>> directory structure.
>
>
> Even if we make physical layout on disk representative of directory
> hierarchy,  I think this will not be a concern. Correct me, if I am missing
> something.
>
> Not sure if this is what you are saying or not. This
>> does raise the question of how to do the disk layout. The simplest thing
>> would be to keep the flat data directories but make the names of the
>> partitions on disk just be logical inode numbers and then have a separate
>> mapping of these inodes to logical names stored in ZK with a cache. I think
>> this would make things like rename fast and atomic. The downside of this is
>> that the 'ls' command will no longer tell you much about the data on a
>> broker.
>>
>
> Enabling renaming of topics is definitely something that will be nice to
> have, however with the flat structure we won't be able to enable encrypting
> different directories/ namespaces with different keys. However, with
> directory hierarchy on disk can be achieved with logical names, each dir
> will need a logical name though.
>
>
>> -Jay
>>
>> On Wed, Oct 21, 2015 at 12:43 PM, Ashish Singh 
>> wrote:
>>
>> > In last KIP hangout following questions were raised.
>> >
>> >1.
>> >
>> >*Whether or not to support move command? If yes, how do we support
>> it.*
>> >I think *move* command will be essential, once we start supporting
>> >directories. However, implementation might be a bit convoluted. A few
>> >things required for it will be, ability to mark a topic unavailable
>> > during
>> >the move, update brokers’ metadata cache to reflect the move.
>> >2.
>> >
>> >*How will acls/ configs inheritance work?*
>> >Say we have /dc/ns/topic.
>> >dc has dc_acl and dc_config. Similarly for ns and topic.
>> >For being able to perform an action on /dc/ns/topic, the user must
>> have
>> >required perms on dc, ns and topic for that operation. For example,
>> > User1
>> >will need DESCRIBE permissions on dc, ns and topic to be able to
>> > describe
>> >/dc/ns/topic.
>> >For configs, configs for /dc/ns/topic will be topic_config +
>> ns_config +
>> >dc_config, in that order. So, if a config is specified for topic then
>> > that
>> >will be used, else it’s parent (ns) will be checked for that config,
>> and
>> >this goes on.
>> >3.
>> >
>> >*Will supporting n-deep hierarchy be a concern?*
>> >This can be a performance concern, however it sounds more of a
>> misusage
>> >of the functionality or bad organization of topics. We can have a
>> depth
>> >limit, but I am not sure if it is required.
>> >4.
>> >
>> >*Will we continue to support multi-directory on disk, that was
>> proposed
>> >in KAFKA-188?*
>> >Yes, we should be able to support that. It is within those
>> directories,
>> >namespaces will be created. The heuristics for choosing least loaded
>> >disc/dir will remain same.
>> >5.
>> >
>> >*Will it be required to move existing topics from default directory/
>> >namespace to a particular directory/ namespace 

Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-21 Thread Ashish Singh
On Wed, Oct 21, 2015 at 2:22 PM, Jay Kreps  wrote:

> Gwen, It's a good question of what the producer semantics are--would we
> only allow you to produce to a partition or first level directory or would
> we hash over whatever subtree you supply? Actually not sure which makes
> more sense...
>
> Ashish, here are some thoughts:
> 1. I think we can do this online. There is a question of what happens to
> readers and writers but presumably it would the same thing as if that topic
> weren't there. There would be no guarantee this would happen atomic over
> different brokers or clients, though.
> 2. ACLs should work like unix perms, right?


Are you suggesting we should move allowed operations to R, W, X model of
unix. Currently, we support these operations

.

I think configs would overide
> hierarchically, so we would have a full set of configs for each partition
> computed by walking up the tree from the root and taking the first
> override). I think this is what you're describing, right?
>

Yes.

3. Totally agree no reason to have an arbitrary limit.
> 4. I actually don't think the physical layout on disk should be at all
> connected to the logical directory hierarchy we present.


I think it will be useful to have that connection as that will enable users
to encrypt different namespaces with different keys. Thus, one more step
towards a completely multi tenant system.


> That is, whether
> you use RAID or not shouldn't impact the location of a topic in your
> directory structure.


Even if we make physical layout on disk representative of directory
hierarchy,  I think this will not be a concern. Correct me, if I am missing
something.

Not sure if this is what you are saying or not. This
> does raise the question of how to do the disk layout. The simplest thing
> would be to keep the flat data directories but make the names of the
> partitions on disk just be logical inode numbers and then have a separate
> mapping of these inodes to logical names stored in ZK with a cache. I think
> this would make things like rename fast and atomic. The downside of this is
> that the 'ls' command will no longer tell you much about the data on a
> broker.
>

Enabling renaming of topics is definitely something that will be nice to
have, however with the flat structure we won't be able to enable encrypting
different directories/ namespaces with different keys. However, with
directory hierarchy on disk can be achieved with logical names, each dir
will need a logical name though.


> -Jay
>
> On Wed, Oct 21, 2015 at 12:43 PM, Ashish Singh 
> wrote:
>
> > In last KIP hangout following questions were raised.
> >
> >1.
> >
> >*Whether or not to support move command? If yes, how do we support
> it.*
> >I think *move* command will be essential, once we start supporting
> >directories. However, implementation might be a bit convoluted. A few
> >things required for it will be, ability to mark a topic unavailable
> > during
> >the move, update brokers’ metadata cache to reflect the move.
> >2.
> >
> >*How will acls/ configs inheritance work?*
> >Say we have /dc/ns/topic.
> >dc has dc_acl and dc_config. Similarly for ns and topic.
> >For being able to perform an action on /dc/ns/topic, the user must
> have
> >required perms on dc, ns and topic for that operation. For example,
> > User1
> >will need DESCRIBE permissions on dc, ns and topic to be able to
> > describe
> >/dc/ns/topic.
> >For configs, configs for /dc/ns/topic will be topic_config +
> ns_config +
> >dc_config, in that order. So, if a config is specified for topic then
> > that
> >will be used, else it’s parent (ns) will be checked for that config,
> and
> >this goes on.
> >3.
> >
> >*Will supporting n-deep hierarchy be a concern?*
> >This can be a performance concern, however it sounds more of a
> misusage
> >of the functionality or bad organization of topics. We can have a
> depth
> >limit, but I am not sure if it is required.
> >4.
> >
> >*Will we continue to support multi-directory on disk, that was
> proposed
> >in KAFKA-188?*
> >Yes, we should be able to support that. It is within those
> directories,
> >namespaces will be created. The heuristics for choosing least loaded
> >disc/dir will remain same.
> >5.
> >
> >*Will it be required to move existing topics from default directory/
> >namespace to a particular directory/ namespace to enable mirror-maker
> >replicate topics in that directory/namespace?*
> >I do not think it will be required, as one can simple add /*/* to
> >mirror-maker’s blacklist and this will only capture topics that exist
> in
> >default namespace. @Joel, does this answer your question?
> >
> > ​
> >
> > On Fri, Oct 16, 2015 at 6:33 PM, Ashish Singh 
> wrote:
> >
> > > On Thu, Oct 15, 2015 at 1:30 PM, Jia

Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-21 Thread Jay Kreps
Gwen, It's a good question of what the producer semantics are--would we
only allow you to produce to a partition or first level directory or would
we hash over whatever subtree you supply? Actually not sure which makes
more sense...

Ashish, here are some thoughts:
1. I think we can do this online. There is a question of what happens to
readers and writers but presumably it would the same thing as if that topic
weren't there. There would be no guarantee this would happen atomic over
different brokers or clients, though.
2. ACLs should work like unix perms, right? I think configs would overide
hierarchically, so we would have a full set of configs for each partition
computed by walking up the tree from the root and taking the first
override). I think this is what you're describing, right?
3. Totally agree no reason to have an arbitrary limit.
4. I actually don't think the physical layout on disk should be at all
connected to the logical directory hierarchy we present. That is, whether
you use RAID or not shouldn't impact the location of a topic in your
directory structure. Not sure if this is what you are saying or not. This
does raise the question of how to do the disk layout. The simplest thing
would be to keep the flat data directories but make the names of the
partitions on disk just be logical inode numbers and then have a separate
mapping of these inodes to logical names stored in ZK with a cache. I think
this would make things like rename fast and atomic. The downside of this is
that the 'ls' command will no longer tell you much about the data on a
broker.

-Jay

On Wed, Oct 21, 2015 at 12:43 PM, Ashish Singh  wrote:

> In last KIP hangout following questions were raised.
>
>1.
>
>*Whether or not to support move command? If yes, how do we support it.*
>I think *move* command will be essential, once we start supporting
>directories. However, implementation might be a bit convoluted. A few
>things required for it will be, ability to mark a topic unavailable
> during
>the move, update brokers’ metadata cache to reflect the move.
>2.
>
>*How will acls/ configs inheritance work?*
>Say we have /dc/ns/topic.
>dc has dc_acl and dc_config. Similarly for ns and topic.
>For being able to perform an action on /dc/ns/topic, the user must have
>required perms on dc, ns and topic for that operation. For example,
> User1
>will need DESCRIBE permissions on dc, ns and topic to be able to
> describe
>/dc/ns/topic.
>For configs, configs for /dc/ns/topic will be topic_config + ns_config +
>dc_config, in that order. So, if a config is specified for topic then
> that
>will be used, else it’s parent (ns) will be checked for that config, and
>this goes on.
>3.
>
>*Will supporting n-deep hierarchy be a concern?*
>This can be a performance concern, however it sounds more of a misusage
>of the functionality or bad organization of topics. We can have a depth
>limit, but I am not sure if it is required.
>4.
>
>*Will we continue to support multi-directory on disk, that was proposed
>in KAFKA-188?*
>Yes, we should be able to support that. It is within those directories,
>namespaces will be created. The heuristics for choosing least loaded
>disc/dir will remain same.
>5.
>
>*Will it be required to move existing topics from default directory/
>namespace to a particular directory/ namespace to enable mirror-maker
>replicate topics in that directory/namespace?*
>I do not think it will be required, as one can simple add /*/* to
>mirror-maker’s blacklist and this will only capture topics that exist in
>default namespace. @Joel, does this answer your question?
>
> ​
>
> On Fri, Oct 16, 2015 at 6:33 PM, Ashish Singh  wrote:
>
> > On Thu, Oct 15, 2015 at 1:30 PM, Jiangjie Qin  >
> > wrote:
> >
> >> Hey Jay,
> >>
> >> If we allow consumer to subscribe to /*/my-event, does that mean we
> allow
> >> consumer to consume cross namespaces?
> >
> > That is the idea. If a user has permissions then yes, he should be able
> to
> > consume from as many namespaces as he wants.
> >
> >
> >> In that case it seems not
> >> "hierarchical" but more like a name field filtering. i.e. user can
> choose
> >> to consume from topic where datacenter={x,y},
> >> topic_name={my-topic1,mytopic2}. Am I understanding right?
> >>
> > I think it is still hierarchical, however with possible filtering (as you
> > said).
> >
> >>
> >> Thanks,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> On Wed, Oct 14, 2015 at 12:49 PM, Jay Kreps  wrote:
> >>
> >> > Hey Jason,
> >> >
> >> > I actually think this is one of the advantages. The problem we have
> >> today
> >> > is that you can't really do bidirectional replication between clusters
> >> > because it would actually be a feedback loop.
> >> >
> >> > So the intended use would be that you would have a structure where the
> >> > top-level directory was DIFFERENT but the topic names were the sam

Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-21 Thread Ashish Singh
In last KIP hangout following questions were raised.

   1.

   *Whether or not to support move command? If yes, how do we support it.*
   I think *move* command will be essential, once we start supporting
   directories. However, implementation might be a bit convoluted. A few
   things required for it will be, ability to mark a topic unavailable during
   the move, update brokers’ metadata cache to reflect the move.
   2.

   *How will acls/ configs inheritance work?*
   Say we have /dc/ns/topic.
   dc has dc_acl and dc_config. Similarly for ns and topic.
   For being able to perform an action on /dc/ns/topic, the user must have
   required perms on dc, ns and topic for that operation. For example, User1
   will need DESCRIBE permissions on dc, ns and topic to be able to describe
   /dc/ns/topic.
   For configs, configs for /dc/ns/topic will be topic_config + ns_config +
   dc_config, in that order. So, if a config is specified for topic then that
   will be used, else it’s parent (ns) will be checked for that config, and
   this goes on.
   3.

   *Will supporting n-deep hierarchy be a concern?*
   This can be a performance concern, however it sounds more of a misusage
   of the functionality or bad organization of topics. We can have a depth
   limit, but I am not sure if it is required.
   4.

   *Will we continue to support multi-directory on disk, that was proposed
   in KAFKA-188?*
   Yes, we should be able to support that. It is within those directories,
   namespaces will be created. The heuristics for choosing least loaded
   disc/dir will remain same.
   5.

   *Will it be required to move existing topics from default directory/
   namespace to a particular directory/ namespace to enable mirror-maker
   replicate topics in that directory/namespace?*
   I do not think it will be required, as one can simple add /*/* to
   mirror-maker’s blacklist and this will only capture topics that exist in
   default namespace. @Joel, does this answer your question?

​

On Fri, Oct 16, 2015 at 6:33 PM, Ashish Singh  wrote:

> On Thu, Oct 15, 2015 at 1:30 PM, Jiangjie Qin 
> wrote:
>
>> Hey Jay,
>>
>> If we allow consumer to subscribe to /*/my-event, does that mean we allow
>> consumer to consume cross namespaces?
>
> That is the idea. If a user has permissions then yes, he should be able to
> consume from as many namespaces as he wants.
>
>
>> In that case it seems not
>> "hierarchical" but more like a name field filtering. i.e. user can choose
>> to consume from topic where datacenter={x,y},
>> topic_name={my-topic1,mytopic2}. Am I understanding right?
>>
> I think it is still hierarchical, however with possible filtering (as you
> said).
>
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Wed, Oct 14, 2015 at 12:49 PM, Jay Kreps  wrote:
>>
>> > Hey Jason,
>> >
>> > I actually think this is one of the advantages. The problem we have
>> today
>> > is that you can't really do bidirectional replication between clusters
>> > because it would actually be a feedback loop.
>> >
>> > So the intended use would be that you would have a structure where the
>> > top-level directory was DIFFERENT but the topic names were the same, so
>> if
>> > you maintain
>> >   /chicago-datacenter/actual-topics
>> >   /oregon-datacenter/actual topics
>> >   etc.
>> > Then you replicate
>> >   /chicago-datacenter/* => /oregon-datacenter
>> > and
>> >   /oregon-datacenter/* => /chicago-datacenter
>> >
>> > People who want the aggregate feed subscribe to /*/my-event.
>> >
>> > The nice thing about this is it gives a unified namespace across all
>> > locations.
>> >
>> > Basically exactly what we do now but you no longer need to add new
>> clusters
>> > to get the namespacing.
>> >
>> > -Jay
>> >
>> >
>> > On Wed, Oct 14, 2015 at 11:24 AM, Jason Gustafson 
>> > wrote:
>> >
>> > > Hey Ashish, thanks for the write-up. I think having a namespace
>> > capability
>> > > is a useful feature for Kafka, in particular with the addition of the
>> > > authorization layer. I probably prefer Jay's hierarchical approach if
>> > we're
>> > > going to embed the namespace in the topic name since it seems more
>> > general.
>> > > That said, one advantage of having a namespace independent of the
>> topic
>> > > name is that it simplifies replication between namespaces a bit since
>> you
>> > > don't have to parse and rewrite topic names. Assuming that
>> hierarchical
>> > > topics will happen eventually anyway, I imagine a common pattern
>> would be
>> > > to preserve the same directory structure in multiple namespaces, so
>> > having
>> > > an easy mechanism for applications to switch between them would be
>> nice.
>> > > The namespace is kind of analogous to a chroot in this case. Of course
>> > you
>> > > can achieve the same thing by having a configurable topic prefix, just
>> > you
>> > > have to do all the topic rewriting, which I'm guessing will be a
>> little
>> > > annoying to implement in all of the clients and tools. However, the
>> > > tradeof

Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-16 Thread Ashish Singh
On Thu, Oct 15, 2015 at 1:30 PM, Jiangjie Qin 
wrote:

> Hey Jay,
>
> If we allow consumer to subscribe to /*/my-event, does that mean we allow
> consumer to consume cross namespaces?

That is the idea. If a user has permissions then yes, he should be able to
consume from as many namespaces as he wants.


> In that case it seems not
> "hierarchical" but more like a name field filtering. i.e. user can choose
> to consume from topic where datacenter={x,y},
> topic_name={my-topic1,mytopic2}. Am I understanding right?
>
I think it is still hierarchical, however with possible filtering (as you
said).

>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, Oct 14, 2015 at 12:49 PM, Jay Kreps  wrote:
>
> > Hey Jason,
> >
> > I actually think this is one of the advantages. The problem we have today
> > is that you can't really do bidirectional replication between clusters
> > because it would actually be a feedback loop.
> >
> > So the intended use would be that you would have a structure where the
> > top-level directory was DIFFERENT but the topic names were the same, so
> if
> > you maintain
> >   /chicago-datacenter/actual-topics
> >   /oregon-datacenter/actual topics
> >   etc.
> > Then you replicate
> >   /chicago-datacenter/* => /oregon-datacenter
> > and
> >   /oregon-datacenter/* => /chicago-datacenter
> >
> > People who want the aggregate feed subscribe to /*/my-event.
> >
> > The nice thing about this is it gives a unified namespace across all
> > locations.
> >
> > Basically exactly what we do now but you no longer need to add new
> clusters
> > to get the namespacing.
> >
> > -Jay
> >
> >
> > On Wed, Oct 14, 2015 at 11:24 AM, Jason Gustafson 
> > wrote:
> >
> > > Hey Ashish, thanks for the write-up. I think having a namespace
> > capability
> > > is a useful feature for Kafka, in particular with the addition of the
> > > authorization layer. I probably prefer Jay's hierarchical approach if
> > we're
> > > going to embed the namespace in the topic name since it seems more
> > general.
> > > That said, one advantage of having a namespace independent of the topic
> > > name is that it simplifies replication between namespaces a bit since
> you
> > > don't have to parse and rewrite topic names. Assuming that hierarchical
> > > topics will happen eventually anyway, I imagine a common pattern would
> be
> > > to preserve the same directory structure in multiple namespaces, so
> > having
> > > an easy mechanism for applications to switch between them would be
> nice.
> > > The namespace is kind of analogous to a chroot in this case. Of course
> > you
> > > can achieve the same thing by having a configurable topic prefix, just
> > you
> > > have to do all the topic rewriting, which I'm guessing will be a little
> > > annoying to implement in all of the clients and tools. However, the
> > > tradeoff (as you mention in the KIP) is that all request schemas have
> to
> > be
> > > updated, which is also annoying.
> > >
> > > -Jason
> > >
> > > On Wed, Oct 14, 2015 at 12:03 AM, Ashish Singh 
> > > wrote:
> > >
> > > > On Mon, Oct 12, 2015 at 7:37 PM, Gwen Shapira 
> > wrote:
> > > >
> > > > > This works really nicely from the consumer side, but what about the
> > > > > producer? If there are no more topics,do we allow producing to a
> > > > directory
> > > > > and have the Partitioner hash-partition messages between all
> > partitions
> > > > in
> > > > > the multiple levels in a directory?
> > > > >
> > > > Good point.
> > > >
> > > > I am personally in favor of maintaining current behavior for
> producer,
> > > > i.e., letting users to only produce to a topic. This is different for
> > > > consumers, the suggested behavior is inline with current behavior.
> One
> > > can
> > > > use regex subscription to achieve the same even today.
> > > >
> > > > >
> > > > > Also, I think we want to preserve the consumer terminology of
> > > "subscribe"
> > > > > to topics / directories, but "assign" partitions - since the
> consumer
> > > > > behavior is different in those cases.
> > > > >
> > > > > On Mon, Oct 12, 2015 at 7:16 PM, Jay Kreps 
> wrote:
> > > > >
> > > > > > Okay this is similar to what I think we have talked about before.
> > Let
> > > > me
> > > > > > elaborate on the idea that I think has been floating around--it's
> > > > pretty
> > > > > > similar with a few differences.
> > > > > >
> > > > > > I think what you are calling the "default namespace" is basically
> > > what
> > > > I
> > > > > > would call the "current working directory" with paths not
> beginning
> > > > with
> > > > > > '/' being interpreted relative to this directory as in the fs.
> > > > > >
> > > > > > One thing you have to work out is what levels in this hierarchy
> you
> > > can
> > > > > > actually subscribe to. I think you are assuming only what we
> > > currently
> > > > > > consider a "topic", i.e. the first level of directories but not
> the
> > > > > > partitions or parent dirs, would be subscribable. If you think
> > about
> > > > it,

Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-15 Thread Jiangjie Qin
Hey Jay,

If we allow consumer to subscribe to /*/my-event, does that mean we allow
consumer to consume cross namespaces? In that case it seems not
"hierarchical" but more like a name field filtering. i.e. user can choose
to consume from topic where datacenter={x,y},
topic_name={my-topic1,mytopic2}. Am I understanding right?

Thanks,

Jiangjie (Becket) Qin

On Wed, Oct 14, 2015 at 12:49 PM, Jay Kreps  wrote:

> Hey Jason,
>
> I actually think this is one of the advantages. The problem we have today
> is that you can't really do bidirectional replication between clusters
> because it would actually be a feedback loop.
>
> So the intended use would be that you would have a structure where the
> top-level directory was DIFFERENT but the topic names were the same, so if
> you maintain
>   /chicago-datacenter/actual-topics
>   /oregon-datacenter/actual topics
>   etc.
> Then you replicate
>   /chicago-datacenter/* => /oregon-datacenter
> and
>   /oregon-datacenter/* => /chicago-datacenter
>
> People who want the aggregate feed subscribe to /*/my-event.
>
> The nice thing about this is it gives a unified namespace across all
> locations.
>
> Basically exactly what we do now but you no longer need to add new clusters
> to get the namespacing.
>
> -Jay
>
>
> On Wed, Oct 14, 2015 at 11:24 AM, Jason Gustafson 
> wrote:
>
> > Hey Ashish, thanks for the write-up. I think having a namespace
> capability
> > is a useful feature for Kafka, in particular with the addition of the
> > authorization layer. I probably prefer Jay's hierarchical approach if
> we're
> > going to embed the namespace in the topic name since it seems more
> general.
> > That said, one advantage of having a namespace independent of the topic
> > name is that it simplifies replication between namespaces a bit since you
> > don't have to parse and rewrite topic names. Assuming that hierarchical
> > topics will happen eventually anyway, I imagine a common pattern would be
> > to preserve the same directory structure in multiple namespaces, so
> having
> > an easy mechanism for applications to switch between them would be nice.
> > The namespace is kind of analogous to a chroot in this case. Of course
> you
> > can achieve the same thing by having a configurable topic prefix, just
> you
> > have to do all the topic rewriting, which I'm guessing will be a little
> > annoying to implement in all of the clients and tools. However, the
> > tradeoff (as you mention in the KIP) is that all request schemas have to
> be
> > updated, which is also annoying.
> >
> > -Jason
> >
> > On Wed, Oct 14, 2015 at 12:03 AM, Ashish Singh 
> > wrote:
> >
> > > On Mon, Oct 12, 2015 at 7:37 PM, Gwen Shapira 
> wrote:
> > >
> > > > This works really nicely from the consumer side, but what about the
> > > > producer? If there are no more topics,do we allow producing to a
> > > directory
> > > > and have the Partitioner hash-partition messages between all
> partitions
> > > in
> > > > the multiple levels in a directory?
> > > >
> > > Good point.
> > >
> > > I am personally in favor of maintaining current behavior for producer,
> > > i.e., letting users to only produce to a topic. This is different for
> > > consumers, the suggested behavior is inline with current behavior. One
> > can
> > > use regex subscription to achieve the same even today.
> > >
> > > >
> > > > Also, I think we want to preserve the consumer terminology of
> > "subscribe"
> > > > to topics / directories, but "assign" partitions - since the consumer
> > > > behavior is different in those cases.
> > > >
> > > > On Mon, Oct 12, 2015 at 7:16 PM, Jay Kreps  wrote:
> > > >
> > > > > Okay this is similar to what I think we have talked about before.
> Let
> > > me
> > > > > elaborate on the idea that I think has been floating around--it's
> > > pretty
> > > > > similar with a few differences.
> > > > >
> > > > > I think what you are calling the "default namespace" is basically
> > what
> > > I
> > > > > would call the "current working directory" with paths not beginning
> > > with
> > > > > '/' being interpreted relative to this directory as in the fs.
> > > > >
> > > > > One thing you have to work out is what levels in this hierarchy you
> > can
> > > > > actually subscribe to. I think you are assuming only what we
> > currently
> > > > > consider a "topic", i.e. the first level of directories but not the
> > > > > partitions or parent dirs, would be subscribable. If you think
> about
> > > it,
> > > > > though, that constraint is a bit arbitrary.
> > > > >
> > > > > I'd propose instead the semantics that:
> > > > > - Subscribing to /a/b/c/0 means subscribing to the 0th partition of
> > > topic
> > > > > "c" in directory /a/b
> > > > > - Subscribing to /a/b/c means subscribing to all partitions in
> > > > > topic/directory "c"
> > > > > - Subscribing to /a/b means subscribing to all partitions in all
> > > > > topics/subdirectories under a/b recursively
> > > > >
> > > > > Effectively the concept of topics goes away

Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-14 Thread Jay Kreps
Hey Jason,

I actually think this is one of the advantages. The problem we have today
is that you can't really do bidirectional replication between clusters
because it would actually be a feedback loop.

So the intended use would be that you would have a structure where the
top-level directory was DIFFERENT but the topic names were the same, so if
you maintain
  /chicago-datacenter/actual-topics
  /oregon-datacenter/actual topics
  etc.
Then you replicate
  /chicago-datacenter/* => /oregon-datacenter
and
  /oregon-datacenter/* => /chicago-datacenter

People who want the aggregate feed subscribe to /*/my-event.

The nice thing about this is it gives a unified namespace across all
locations.

Basically exactly what we do now but you no longer need to add new clusters
to get the namespacing.

-Jay


On Wed, Oct 14, 2015 at 11:24 AM, Jason Gustafson 
wrote:

> Hey Ashish, thanks for the write-up. I think having a namespace capability
> is a useful feature for Kafka, in particular with the addition of the
> authorization layer. I probably prefer Jay's hierarchical approach if we're
> going to embed the namespace in the topic name since it seems more general.
> That said, one advantage of having a namespace independent of the topic
> name is that it simplifies replication between namespaces a bit since you
> don't have to parse and rewrite topic names. Assuming that hierarchical
> topics will happen eventually anyway, I imagine a common pattern would be
> to preserve the same directory structure in multiple namespaces, so having
> an easy mechanism for applications to switch between them would be nice.
> The namespace is kind of analogous to a chroot in this case. Of course you
> can achieve the same thing by having a configurable topic prefix, just you
> have to do all the topic rewriting, which I'm guessing will be a little
> annoying to implement in all of the clients and tools. However, the
> tradeoff (as you mention in the KIP) is that all request schemas have to be
> updated, which is also annoying.
>
> -Jason
>
> On Wed, Oct 14, 2015 at 12:03 AM, Ashish Singh 
> wrote:
>
> > On Mon, Oct 12, 2015 at 7:37 PM, Gwen Shapira  wrote:
> >
> > > This works really nicely from the consumer side, but what about the
> > > producer? If there are no more topics,do we allow producing to a
> > directory
> > > and have the Partitioner hash-partition messages between all partitions
> > in
> > > the multiple levels in a directory?
> > >
> > Good point.
> >
> > I am personally in favor of maintaining current behavior for producer,
> > i.e., letting users to only produce to a topic. This is different for
> > consumers, the suggested behavior is inline with current behavior. One
> can
> > use regex subscription to achieve the same even today.
> >
> > >
> > > Also, I think we want to preserve the consumer terminology of
> "subscribe"
> > > to topics / directories, but "assign" partitions - since the consumer
> > > behavior is different in those cases.
> > >
> > > On Mon, Oct 12, 2015 at 7:16 PM, Jay Kreps  wrote:
> > >
> > > > Okay this is similar to what I think we have talked about before. Let
> > me
> > > > elaborate on the idea that I think has been floating around--it's
> > pretty
> > > > similar with a few differences.
> > > >
> > > > I think what you are calling the "default namespace" is basically
> what
> > I
> > > > would call the "current working directory" with paths not beginning
> > with
> > > > '/' being interpreted relative to this directory as in the fs.
> > > >
> > > > One thing you have to work out is what levels in this hierarchy you
> can
> > > > actually subscribe to. I think you are assuming only what we
> currently
> > > > consider a "topic", i.e. the first level of directories but not the
> > > > partitions or parent dirs, would be subscribable. If you think about
> > it,
> > > > though, that constraint is a bit arbitrary.
> > > >
> > > > I'd propose instead the semantics that:
> > > > - Subscribing to /a/b/c/0 means subscribing to the 0th partition of
> > topic
> > > > "c" in directory /a/b
> > > > - Subscribing to /a/b/c means subscribing to all partitions in
> > > > topic/directory "c"
> > > > - Subscribing to /a/b means subscribing to all partitions in all
> > > > topics/subdirectories under a/b recursively
> > > >
> > > > Effectively the concept of topics goes away entirely--you just have
> > > > partitions/logs and directories. In this respect rather than adding
> new
> > > > concepts this new feature would actually just generalizes what we
> have
> > > > (which I think is a good thing).
> > > >
> > > > -Jay
> > > >
> > > > On Mon, Oct 12, 2015 at 6:24 PM, Ashish Singh 
> > > wrote:
> > > >
> > > > > On Mon, Oct 12, 2015 at 5:42 PM, Jay Kreps 
> wrote:
> > > > >
> > > > > > Great. I definitely would strongly favor carrying over user's
> > > intuition
> > > > > > from FS unless we think we need a very different model. The minor
> > > > details
> > > > > > like the seperator and namespace term will 

Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-14 Thread Jason Gustafson
Hey Ashish, thanks for the write-up. I think having a namespace capability
is a useful feature for Kafka, in particular with the addition of the
authorization layer. I probably prefer Jay's hierarchical approach if we're
going to embed the namespace in the topic name since it seems more general.
That said, one advantage of having a namespace independent of the topic
name is that it simplifies replication between namespaces a bit since you
don't have to parse and rewrite topic names. Assuming that hierarchical
topics will happen eventually anyway, I imagine a common pattern would be
to preserve the same directory structure in multiple namespaces, so having
an easy mechanism for applications to switch between them would be nice.
The namespace is kind of analogous to a chroot in this case. Of course you
can achieve the same thing by having a configurable topic prefix, just you
have to do all the topic rewriting, which I'm guessing will be a little
annoying to implement in all of the clients and tools. However, the
tradeoff (as you mention in the KIP) is that all request schemas have to be
updated, which is also annoying.

-Jason

On Wed, Oct 14, 2015 at 12:03 AM, Ashish Singh  wrote:

> On Mon, Oct 12, 2015 at 7:37 PM, Gwen Shapira  wrote:
>
> > This works really nicely from the consumer side, but what about the
> > producer? If there are no more topics,do we allow producing to a
> directory
> > and have the Partitioner hash-partition messages between all partitions
> in
> > the multiple levels in a directory?
> >
> Good point.
>
> I am personally in favor of maintaining current behavior for producer,
> i.e., letting users to only produce to a topic. This is different for
> consumers, the suggested behavior is inline with current behavior. One can
> use regex subscription to achieve the same even today.
>
> >
> > Also, I think we want to preserve the consumer terminology of "subscribe"
> > to topics / directories, but "assign" partitions - since the consumer
> > behavior is different in those cases.
> >
> > On Mon, Oct 12, 2015 at 7:16 PM, Jay Kreps  wrote:
> >
> > > Okay this is similar to what I think we have talked about before. Let
> me
> > > elaborate on the idea that I think has been floating around--it's
> pretty
> > > similar with a few differences.
> > >
> > > I think what you are calling the "default namespace" is basically what
> I
> > > would call the "current working directory" with paths not beginning
> with
> > > '/' being interpreted relative to this directory as in the fs.
> > >
> > > One thing you have to work out is what levels in this hierarchy you can
> > > actually subscribe to. I think you are assuming only what we currently
> > > consider a "topic", i.e. the first level of directories but not the
> > > partitions or parent dirs, would be subscribable. If you think about
> it,
> > > though, that constraint is a bit arbitrary.
> > >
> > > I'd propose instead the semantics that:
> > > - Subscribing to /a/b/c/0 means subscribing to the 0th partition of
> topic
> > > "c" in directory /a/b
> > > - Subscribing to /a/b/c means subscribing to all partitions in
> > > topic/directory "c"
> > > - Subscribing to /a/b means subscribing to all partitions in all
> > > topics/subdirectories under a/b recursively
> > >
> > > Effectively the concept of topics goes away entirely--you just have
> > > partitions/logs and directories. In this respect rather than adding new
> > > concepts this new feature would actually just generalizes what we have
> > > (which I think is a good thing).
> > >
> > > -Jay
> > >
> > > On Mon, Oct 12, 2015 at 6:24 PM, Ashish Singh 
> > wrote:
> > >
> > > > On Mon, Oct 12, 2015 at 5:42 PM, Jay Kreps  wrote:
> > > >
> > > > > Great. I definitely would strongly favor carrying over user's
> > intuition
> > > > > from FS unless we think we need a very different model. The minor
> > > details
> > > > > like the seperator and namespace term will help with that.
> > > > >
> > > > > Follow-up question, say I have a layout like
> > > > >/chicago-datacenter/user-events/pageviews
> > > > > Can I subscribe to
> > > > >/chicago-datacenter/user-events
> > > > >
> > > > Yes, however they will have need a regex like
> > > > /chicago-datacenter/user-events/*
> > > >
> > > > > to get the full firehose of user events from chicago? Can I
> subscribe
> > > to
> > > > >/*/user-events
> > > > > to get user events originating from all datacenters?
> > > > >
> > > > Yes, however they will have need a regex like
> > > > /chicago-datacenter/user-events/*
> > > > Yes
> > > >
> > > > >
> > > > > (Assuming, for now, that these are all in the same cluster...)
> > > > >
> > > > > Also, just to confirm, it sounds from the proposal like config
> > > overrides
> > > > > would become fully hierarchical so you can override config at any
> > > > directory
> > > > > point. This will add complexity in implementation but I think will
> > > likely
> > > > > be much more operator friendly.
> > > > >
> > > > Ye

Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-14 Thread Ashish Singh
On Mon, Oct 12, 2015 at 7:37 PM, Gwen Shapira  wrote:

> This works really nicely from the consumer side, but what about the
> producer? If there are no more topics,do we allow producing to a directory
> and have the Partitioner hash-partition messages between all partitions in
> the multiple levels in a directory?
>
Good point.

I am personally in favor of maintaining current behavior for producer,
i.e., letting users to only produce to a topic. This is different for
consumers, the suggested behavior is inline with current behavior. One can
use regex subscription to achieve the same even today.

>
> Also, I think we want to preserve the consumer terminology of "subscribe"
> to topics / directories, but "assign" partitions - since the consumer
> behavior is different in those cases.
>
> On Mon, Oct 12, 2015 at 7:16 PM, Jay Kreps  wrote:
>
> > Okay this is similar to what I think we have talked about before. Let me
> > elaborate on the idea that I think has been floating around--it's pretty
> > similar with a few differences.
> >
> > I think what you are calling the "default namespace" is basically what I
> > would call the "current working directory" with paths not beginning with
> > '/' being interpreted relative to this directory as in the fs.
> >
> > One thing you have to work out is what levels in this hierarchy you can
> > actually subscribe to. I think you are assuming only what we currently
> > consider a "topic", i.e. the first level of directories but not the
> > partitions or parent dirs, would be subscribable. If you think about it,
> > though, that constraint is a bit arbitrary.
> >
> > I'd propose instead the semantics that:
> > - Subscribing to /a/b/c/0 means subscribing to the 0th partition of topic
> > "c" in directory /a/b
> > - Subscribing to /a/b/c means subscribing to all partitions in
> > topic/directory "c"
> > - Subscribing to /a/b means subscribing to all partitions in all
> > topics/subdirectories under a/b recursively
> >
> > Effectively the concept of topics goes away entirely--you just have
> > partitions/logs and directories. In this respect rather than adding new
> > concepts this new feature would actually just generalizes what we have
> > (which I think is a good thing).
> >
> > -Jay
> >
> > On Mon, Oct 12, 2015 at 6:24 PM, Ashish Singh 
> wrote:
> >
> > > On Mon, Oct 12, 2015 at 5:42 PM, Jay Kreps  wrote:
> > >
> > > > Great. I definitely would strongly favor carrying over user's
> intuition
> > > > from FS unless we think we need a very different model. The minor
> > details
> > > > like the seperator and namespace term will help with that.
> > > >
> > > > Follow-up question, say I have a layout like
> > > >/chicago-datacenter/user-events/pageviews
> > > > Can I subscribe to
> > > >/chicago-datacenter/user-events
> > > >
> > > Yes, however they will have need a regex like
> > > /chicago-datacenter/user-events/*
> > >
> > > > to get the full firehose of user events from chicago? Can I subscribe
> > to
> > > >/*/user-events
> > > > to get user events originating from all datacenters?
> > > >
> > > Yes, however they will have need a regex like
> > > /chicago-datacenter/user-events/*
> > > Yes
> > >
> > > >
> > > > (Assuming, for now, that these are all in the same cluster...)
> > > >
> > > > Also, just to confirm, it sounds from the proposal like config
> > overrides
> > > > would become fully hierarchical so you can override config at any
> > > directory
> > > > point. This will add complexity in implementation but I think will
> > likely
> > > > be much more operator friendly.
> > > >
> > > Yes, that is the idea.
> > >
> > > >
> > > > There are about a thousand details to discuss in terms of how this
> > would
> > > > impact the metadata request, various zk entries, and various other
> > > aspects,
> > > > but probably it makes sense to first agree on how we would want it to
> > > work
> > > > and then start to dive into how to implement that.
> > > >
> > > Agreed.
> > >
> > > >
> > > > -Jay
> > > >
> > > > On Mon, Oct 12, 2015 at 5:28 PM, Ashish Singh 
> > > wrote:
> > > >
> > > > > Hey Jay, thanks for reviewing the proposal. Answers inline.
> > > > >
> > > > > On Mon, Oct 12, 2015 at 10:53 AM, Jay Kreps 
> > wrote:
> > > > >
> > > > > > Hey guys,
> > > > > >
> > > > > > I think this is an important feature and one we've talked about
> > for a
> > > > > > while. I really think trying to invent a new nomenclature is
> going
> > to
> > > > > make
> > > > > > it hard for people to understand, though. As such I recommend we
> > call
> > > > > > namespaces "directories" and denote them with '/'--this will make
> > the
> > > > > > feature 1000x more understandable to people.
> > > > >
> > > > > Essentially you are suggesting two things here.
> > > > > 1. Use "Directory" instead of "Namespace" as it is more intuitive.
> I
> > > > agree.
> > > > > 2. Make '/' as delimiter instead of ':'. Fine with me and I agree
> if
> > we
> > > > > call these directories, '/' is the 

Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-13 Thread Ashish Singh
On Mon, Oct 12, 2015 at 7:16 PM, Jay Kreps  wrote:

> Okay this is similar to what I think we have talked about before. Let me
> elaborate on the idea that I think has been floating around--it's pretty
> similar with a few differences.
>
> I think what you are calling the "default namespace" is basically what I
> would call the "current working directory" with paths not beginning with
> '/' being interpreted relative to this directory as in the fs.
>
> One thing you have to work out is what levels in this hierarchy you can
> actually subscribe to. I think you are assuming only what we currently
> consider a "topic", i.e. the first level of directories but not the
> partitions or parent dirs, would be subscribable. If you think about it,
> though, that constraint is a bit arbitrary.
>
> I'd propose instead the semantics that:
> - Subscribing to /a/b/c/0 means subscribing to the 0th partition of topic
> "c" in directory /a/b
> - Subscribing to /a/b/c means subscribing to all partitions in
> topic/directory "c"
> - Subscribing to /a/b means subscribing to all partitions in all
> topics/subdirectories under a/b recursively
>
Seems reasonable.

>
> Effectively the concept of topics goes away entirely--you just have
> partitions/logs and directories. In this respect rather than adding new
> concepts this new feature would actually just generalizes what we have
> (which I think is a good thing).
>
> -Jay
>
> On Mon, Oct 12, 2015 at 6:24 PM, Ashish Singh  wrote:
>
> > On Mon, Oct 12, 2015 at 5:42 PM, Jay Kreps  wrote:
> >
> > > Great. I definitely would strongly favor carrying over user's intuition
> > > from FS unless we think we need a very different model. The minor
> details
> > > like the seperator and namespace term will help with that.
> > >
> > > Follow-up question, say I have a layout like
> > >/chicago-datacenter/user-events/pageviews
> > > Can I subscribe to
> > >/chicago-datacenter/user-events
> > >
> > Yes, however they will have need a regex like
> > /chicago-datacenter/user-events/*
> >
> > > to get the full firehose of user events from chicago? Can I subscribe
> to
> > >/*/user-events
> > > to get user events originating from all datacenters?
> > >
> > Yes, however they will have need a regex like
> > /chicago-datacenter/user-events/*
> > Yes
> >
> > >
> > > (Assuming, for now, that these are all in the same cluster...)
> > >
> > > Also, just to confirm, it sounds from the proposal like config
> overrides
> > > would become fully hierarchical so you can override config at any
> > directory
> > > point. This will add complexity in implementation but I think will
> likely
> > > be much more operator friendly.
> > >
> > Yes, that is the idea.
> >
> > >
> > > There are about a thousand details to discuss in terms of how this
> would
> > > impact the metadata request, various zk entries, and various other
> > aspects,
> > > but probably it makes sense to first agree on how we would want it to
> > work
> > > and then start to dive into how to implement that.
> > >
> > Agreed.
> >
> > >
> > > -Jay
> > >
> > > On Mon, Oct 12, 2015 at 5:28 PM, Ashish Singh 
> > wrote:
> > >
> > > > Hey Jay, thanks for reviewing the proposal. Answers inline.
> > > >
> > > > On Mon, Oct 12, 2015 at 10:53 AM, Jay Kreps 
> wrote:
> > > >
> > > > > Hey guys,
> > > > >
> > > > > I think this is an important feature and one we've talked about
> for a
> > > > > while. I really think trying to invent a new nomenclature is going
> to
> > > > make
> > > > > it hard for people to understand, though. As such I recommend we
> call
> > > > > namespaces "directories" and denote them with '/'--this will make
> the
> > > > > feature 1000x more understandable to people.
> > > >
> > > > Essentially you are suggesting two things here.
> > > > 1. Use "Directory" instead of "Namespace" as it is more intuitive. I
> > > agree.
> > > > 2. Make '/' as delimiter instead of ':'. Fine with me and I agree if
> we
> > > > call these directories, '/' is the way to go.
> > > >
> > > > I think we should inheret the
> > > > > semantics of normal unix fs in so far as it makes sense.
> > > > >
> > > > > In this approach we get rid of topics entirely, instead we really
> > just
> > > > have
> > > > > partitions which are the equivalent of a file and retain their
> > numeric
> > > > > names, and the existing topic concept is just the first directory
> > level
> > > > but
> > > > > we generalize to allow arbitrarily many more levels of nesting.
> This
> > > > allows
> > > > > categorization of data, such as
> /datacenter1/user-events/page-views/3
> > > and
> > > > > you can subscribe, apply configs or permissions at any level of the
> > > > > hierarchy.
> > > > >
> > > > +1. This actually requires just a minor change to existing proposal,
> > > i.e.,
> > > > "some:namespace:topic" becomes "some/namespace/topic".
> > > >
> > > > >
> > > > > I'm actually not 100% such what the semantics of accessing data in
> > > > > differing namespaces is in the curren

Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-12 Thread Gwen Shapira
This works really nicely from the consumer side, but what about the
producer? If there are no more topics,do we allow producing to a directory
and have the Partitioner hash-partition messages between all partitions in
the multiple levels in a directory?

Also, I think we want to preserve the consumer terminology of "subscribe"
to topics / directories, but "assign" partitions - since the consumer
behavior is different in those cases.

On Mon, Oct 12, 2015 at 7:16 PM, Jay Kreps  wrote:

> Okay this is similar to what I think we have talked about before. Let me
> elaborate on the idea that I think has been floating around--it's pretty
> similar with a few differences.
>
> I think what you are calling the "default namespace" is basically what I
> would call the "current working directory" with paths not beginning with
> '/' being interpreted relative to this directory as in the fs.
>
> One thing you have to work out is what levels in this hierarchy you can
> actually subscribe to. I think you are assuming only what we currently
> consider a "topic", i.e. the first level of directories but not the
> partitions or parent dirs, would be subscribable. If you think about it,
> though, that constraint is a bit arbitrary.
>
> I'd propose instead the semantics that:
> - Subscribing to /a/b/c/0 means subscribing to the 0th partition of topic
> "c" in directory /a/b
> - Subscribing to /a/b/c means subscribing to all partitions in
> topic/directory "c"
> - Subscribing to /a/b means subscribing to all partitions in all
> topics/subdirectories under a/b recursively
>
> Effectively the concept of topics goes away entirely--you just have
> partitions/logs and directories. In this respect rather than adding new
> concepts this new feature would actually just generalizes what we have
> (which I think is a good thing).
>
> -Jay
>
> On Mon, Oct 12, 2015 at 6:24 PM, Ashish Singh  wrote:
>
> > On Mon, Oct 12, 2015 at 5:42 PM, Jay Kreps  wrote:
> >
> > > Great. I definitely would strongly favor carrying over user's intuition
> > > from FS unless we think we need a very different model. The minor
> details
> > > like the seperator and namespace term will help with that.
> > >
> > > Follow-up question, say I have a layout like
> > >/chicago-datacenter/user-events/pageviews
> > > Can I subscribe to
> > >/chicago-datacenter/user-events
> > >
> > Yes, however they will have need a regex like
> > /chicago-datacenter/user-events/*
> >
> > > to get the full firehose of user events from chicago? Can I subscribe
> to
> > >/*/user-events
> > > to get user events originating from all datacenters?
> > >
> > Yes, however they will have need a regex like
> > /chicago-datacenter/user-events/*
> > Yes
> >
> > >
> > > (Assuming, for now, that these are all in the same cluster...)
> > >
> > > Also, just to confirm, it sounds from the proposal like config
> overrides
> > > would become fully hierarchical so you can override config at any
> > directory
> > > point. This will add complexity in implementation but I think will
> likely
> > > be much more operator friendly.
> > >
> > Yes, that is the idea.
> >
> > >
> > > There are about a thousand details to discuss in terms of how this
> would
> > > impact the metadata request, various zk entries, and various other
> > aspects,
> > > but probably it makes sense to first agree on how we would want it to
> > work
> > > and then start to dive into how to implement that.
> > >
> > Agreed.
> >
> > >
> > > -Jay
> > >
> > > On Mon, Oct 12, 2015 at 5:28 PM, Ashish Singh 
> > wrote:
> > >
> > > > Hey Jay, thanks for reviewing the proposal. Answers inline.
> > > >
> > > > On Mon, Oct 12, 2015 at 10:53 AM, Jay Kreps 
> wrote:
> > > >
> > > > > Hey guys,
> > > > >
> > > > > I think this is an important feature and one we've talked about
> for a
> > > > > while. I really think trying to invent a new nomenclature is going
> to
> > > > make
> > > > > it hard for people to understand, though. As such I recommend we
> call
> > > > > namespaces "directories" and denote them with '/'--this will make
> the
> > > > > feature 1000x more understandable to people.
> > > >
> > > > Essentially you are suggesting two things here.
> > > > 1. Use "Directory" instead of "Namespace" as it is more intuitive. I
> > > agree.
> > > > 2. Make '/' as delimiter instead of ':'. Fine with me and I agree if
> we
> > > > call these directories, '/' is the way to go.
> > > >
> > > > I think we should inheret the
> > > > > semantics of normal unix fs in so far as it makes sense.
> > > > >
> > > > > In this approach we get rid of topics entirely, instead we really
> > just
> > > > have
> > > > > partitions which are the equivalent of a file and retain their
> > numeric
> > > > > names, and the existing topic concept is just the first directory
> > level
> > > > but
> > > > > we generalize to allow arbitrarily many more levels of nesting.
> This
> > > > allows
> > > > > categorization of data, such as
> /datacenter1/user-events/page-view

Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-12 Thread Jay Kreps
Okay this is similar to what I think we have talked about before. Let me
elaborate on the idea that I think has been floating around--it's pretty
similar with a few differences.

I think what you are calling the "default namespace" is basically what I
would call the "current working directory" with paths not beginning with
'/' being interpreted relative to this directory as in the fs.

One thing you have to work out is what levels in this hierarchy you can
actually subscribe to. I think you are assuming only what we currently
consider a "topic", i.e. the first level of directories but not the
partitions or parent dirs, would be subscribable. If you think about it,
though, that constraint is a bit arbitrary.

I'd propose instead the semantics that:
- Subscribing to /a/b/c/0 means subscribing to the 0th partition of topic
"c" in directory /a/b
- Subscribing to /a/b/c means subscribing to all partitions in
topic/directory "c"
- Subscribing to /a/b means subscribing to all partitions in all
topics/subdirectories under a/b recursively

Effectively the concept of topics goes away entirely--you just have
partitions/logs and directories. In this respect rather than adding new
concepts this new feature would actually just generalizes what we have
(which I think is a good thing).

-Jay

On Mon, Oct 12, 2015 at 6:24 PM, Ashish Singh  wrote:

> On Mon, Oct 12, 2015 at 5:42 PM, Jay Kreps  wrote:
>
> > Great. I definitely would strongly favor carrying over user's intuition
> > from FS unless we think we need a very different model. The minor details
> > like the seperator and namespace term will help with that.
> >
> > Follow-up question, say I have a layout like
> >/chicago-datacenter/user-events/pageviews
> > Can I subscribe to
> >/chicago-datacenter/user-events
> >
> Yes, however they will have need a regex like
> /chicago-datacenter/user-events/*
>
> > to get the full firehose of user events from chicago? Can I subscribe to
> >/*/user-events
> > to get user events originating from all datacenters?
> >
> Yes, however they will have need a regex like
> /chicago-datacenter/user-events/*
> Yes
>
> >
> > (Assuming, for now, that these are all in the same cluster...)
> >
> > Also, just to confirm, it sounds from the proposal like config overrides
> > would become fully hierarchical so you can override config at any
> directory
> > point. This will add complexity in implementation but I think will likely
> > be much more operator friendly.
> >
> Yes, that is the idea.
>
> >
> > There are about a thousand details to discuss in terms of how this would
> > impact the metadata request, various zk entries, and various other
> aspects,
> > but probably it makes sense to first agree on how we would want it to
> work
> > and then start to dive into how to implement that.
> >
> Agreed.
>
> >
> > -Jay
> >
> > On Mon, Oct 12, 2015 at 5:28 PM, Ashish Singh 
> wrote:
> >
> > > Hey Jay, thanks for reviewing the proposal. Answers inline.
> > >
> > > On Mon, Oct 12, 2015 at 10:53 AM, Jay Kreps  wrote:
> > >
> > > > Hey guys,
> > > >
> > > > I think this is an important feature and one we've talked about for a
> > > > while. I really think trying to invent a new nomenclature is going to
> > > make
> > > > it hard for people to understand, though. As such I recommend we call
> > > > namespaces "directories" and denote them with '/'--this will make the
> > > > feature 1000x more understandable to people.
> > >
> > > Essentially you are suggesting two things here.
> > > 1. Use "Directory" instead of "Namespace" as it is more intuitive. I
> > agree.
> > > 2. Make '/' as delimiter instead of ':'. Fine with me and I agree if we
> > > call these directories, '/' is the way to go.
> > >
> > > I think we should inheret the
> > > > semantics of normal unix fs in so far as it makes sense.
> > > >
> > > > In this approach we get rid of topics entirely, instead we really
> just
> > > have
> > > > partitions which are the equivalent of a file and retain their
> numeric
> > > > names, and the existing topic concept is just the first directory
> level
> > > but
> > > > we generalize to allow arbitrarily many more levels of nesting. This
> > > allows
> > > > categorization of data, such as /datacenter1/user-events/page-views/3
> > and
> > > > you can subscribe, apply configs or permissions at any level of the
> > > > hierarchy.
> > > >
> > > +1. This actually requires just a minor change to existing proposal,
> > i.e.,
> > > "some:namespace:topic" becomes "some/namespace/topic".
> > >
> > > >
> > > > I'm actually not 100% such what the semantics of accessing data in
> > > > differing namespaces is in the current proposal, maybe you can
> clarify
> > > > Ashish?
> > >
> > > I will add more info to KIP on this, however I think a client should be
> > > able to access data in any namespace as long as following conditions
> are
> > > satisfied.
> > >
> > > 1. Namespace, the client is trying to access, exists.
> > > 2. The client has sufficient permi

Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-12 Thread Ashish Singh
On Mon, Oct 12, 2015 at 5:42 PM, Jay Kreps  wrote:

> Great. I definitely would strongly favor carrying over user's intuition
> from FS unless we think we need a very different model. The minor details
> like the seperator and namespace term will help with that.
>
> Follow-up question, say I have a layout like
>/chicago-datacenter/user-events/pageviews
> Can I subscribe to
>/chicago-datacenter/user-events
>
Yes, however they will have need a regex like
/chicago-datacenter/user-events/*

> to get the full firehose of user events from chicago? Can I subscribe to
>/*/user-events
> to get user events originating from all datacenters?
>
Yes, however they will have need a regex like
/chicago-datacenter/user-events/*
Yes

>
> (Assuming, for now, that these are all in the same cluster...)
>
> Also, just to confirm, it sounds from the proposal like config overrides
> would become fully hierarchical so you can override config at any directory
> point. This will add complexity in implementation but I think will likely
> be much more operator friendly.
>
Yes, that is the idea.

>
> There are about a thousand details to discuss in terms of how this would
> impact the metadata request, various zk entries, and various other aspects,
> but probably it makes sense to first agree on how we would want it to work
> and then start to dive into how to implement that.
>
Agreed.

>
> -Jay
>
> On Mon, Oct 12, 2015 at 5:28 PM, Ashish Singh  wrote:
>
> > Hey Jay, thanks for reviewing the proposal. Answers inline.
> >
> > On Mon, Oct 12, 2015 at 10:53 AM, Jay Kreps  wrote:
> >
> > > Hey guys,
> > >
> > > I think this is an important feature and one we've talked about for a
> > > while. I really think trying to invent a new nomenclature is going to
> > make
> > > it hard for people to understand, though. As such I recommend we call
> > > namespaces "directories" and denote them with '/'--this will make the
> > > feature 1000x more understandable to people.
> >
> > Essentially you are suggesting two things here.
> > 1. Use "Directory" instead of "Namespace" as it is more intuitive. I
> agree.
> > 2. Make '/' as delimiter instead of ':'. Fine with me and I agree if we
> > call these directories, '/' is the way to go.
> >
> > I think we should inheret the
> > > semantics of normal unix fs in so far as it makes sense.
> > >
> > > In this approach we get rid of topics entirely, instead we really just
> > have
> > > partitions which are the equivalent of a file and retain their numeric
> > > names, and the existing topic concept is just the first directory level
> > but
> > > we generalize to allow arbitrarily many more levels of nesting. This
> > allows
> > > categorization of data, such as /datacenter1/user-events/page-views/3
> and
> > > you can subscribe, apply configs or permissions at any level of the
> > > hierarchy.
> > >
> > +1. This actually requires just a minor change to existing proposal,
> i.e.,
> > "some:namespace:topic" becomes "some/namespace/topic".
> >
> > >
> > > I'm actually not 100% such what the semantics of accessing data in
> > > differing namespaces is in the current proposal, maybe you can clarify
> > > Ashish?
> >
> > I will add more info to KIP on this, however I think a client should be
> > able to access data in any namespace as long as following conditions are
> > satisfied.
> >
> > 1. Namespace, the client is trying to access, exists.
> > 2. The client has sufficient permissions on the namespace for type of
> > operation the client is trying to perform on a topic within that
> namespace.
> > 3. The client has sufficient permissions on the topic for type of
> operation
> > the client is trying to perform on that topic.
> >
> > If we choose to go with what you suggested earlier that just have
> hierarchy
> > of directories, then step 3 will actually be covered in step 2.
> >
> > In the current proposal, consumers will subscribe to a topic in a
> namespace
> > by specifying : as the topic name. They can subscribe
> to
> > topics from multiple namespaces.
> >
> > Let me know if I totally missed your question.
> >
> > Since the point of Kafka is sharing data I think it is really
> > > important that the grouping be just for
> > convenience/permissions/config/etc
> > > and that it remain possible to access multiple directories/namespaces
> > from
> > > the same client.
> > >
> > Totally agree with you.
> >
> > >
> > > -Jay
> > >
> > > On Fri, Oct 9, 2015 at 6:32 PM, Ashish Singh 
> > wrote:
> > >
> > > > Hey Guys,
> > > >
> > > > I just created KIP-37 for adding namespaces to Kafka.
> > > >
> > > > KIP-37
> > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-37+-+Add+Namespaces+to+Kafka
> > > > >
> > > > tracks the proposal.
> > > >
> > > > The idea is to make Kafka support multi-tenancy via namespaces.
> > > >
> > > > Feedback and comments are welcome.
> > > > ​
> > > > --
> > > >
> > > > Regards,
> > > > Ashish
> > > >
> > >
> >
> >
> >
> > --
> >
> > Regards,
> > Ashish

Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-12 Thread Jay Kreps
Great. I definitely would strongly favor carrying over user's intuition
from FS unless we think we need a very different model. The minor details
like the seperator and namespace term will help with that.

Follow-up question, say I have a layout like
   /chicago-datacenter/user-events/pageviews
Can I subscribe to
   /chicago-datacenter/user-events
to get the full firehose of user events from chicago? Can I subscribe to
   /*/user-events
to get user events originating from all datacenters?

(Assuming, for now, that these are all in the same cluster...)

Also, just to confirm, it sounds from the proposal like config overrides
would become fully hierarchical so you can override config at any directory
point. This will add complexity in implementation but I think will likely
be much more operator friendly.

There are about a thousand details to discuss in terms of how this would
impact the metadata request, various zk entries, and various other aspects,
but probably it makes sense to first agree on how we would want it to work
and then start to dive into how to implement that.

-Jay

On Mon, Oct 12, 2015 at 5:28 PM, Ashish Singh  wrote:

> Hey Jay, thanks for reviewing the proposal. Answers inline.
>
> On Mon, Oct 12, 2015 at 10:53 AM, Jay Kreps  wrote:
>
> > Hey guys,
> >
> > I think this is an important feature and one we've talked about for a
> > while. I really think trying to invent a new nomenclature is going to
> make
> > it hard for people to understand, though. As such I recommend we call
> > namespaces "directories" and denote them with '/'--this will make the
> > feature 1000x more understandable to people.
>
> Essentially you are suggesting two things here.
> 1. Use "Directory" instead of "Namespace" as it is more intuitive. I agree.
> 2. Make '/' as delimiter instead of ':'. Fine with me and I agree if we
> call these directories, '/' is the way to go.
>
> I think we should inheret the
> > semantics of normal unix fs in so far as it makes sense.
> >
> > In this approach we get rid of topics entirely, instead we really just
> have
> > partitions which are the equivalent of a file and retain their numeric
> > names, and the existing topic concept is just the first directory level
> but
> > we generalize to allow arbitrarily many more levels of nesting. This
> allows
> > categorization of data, such as /datacenter1/user-events/page-views/3 and
> > you can subscribe, apply configs or permissions at any level of the
> > hierarchy.
> >
> +1. This actually requires just a minor change to existing proposal, i.e.,
> "some:namespace:topic" becomes "some/namespace/topic".
>
> >
> > I'm actually not 100% such what the semantics of accessing data in
> > differing namespaces is in the current proposal, maybe you can clarify
> > Ashish?
>
> I will add more info to KIP on this, however I think a client should be
> able to access data in any namespace as long as following conditions are
> satisfied.
>
> 1. Namespace, the client is trying to access, exists.
> 2. The client has sufficient permissions on the namespace for type of
> operation the client is trying to perform on a topic within that namespace.
> 3. The client has sufficient permissions on the topic for type of operation
> the client is trying to perform on that topic.
>
> If we choose to go with what you suggested earlier that just have hierarchy
> of directories, then step 3 will actually be covered in step 2.
>
> In the current proposal, consumers will subscribe to a topic in a namespace
> by specifying : as the topic name. They can subscribe to
> topics from multiple namespaces.
>
> Let me know if I totally missed your question.
>
> Since the point of Kafka is sharing data I think it is really
> > important that the grouping be just for
> convenience/permissions/config/etc
> > and that it remain possible to access multiple directories/namespaces
> from
> > the same client.
> >
> Totally agree with you.
>
> >
> > -Jay
> >
> > On Fri, Oct 9, 2015 at 6:32 PM, Ashish Singh 
> wrote:
> >
> > > Hey Guys,
> > >
> > > I just created KIP-37 for adding namespaces to Kafka.
> > >
> > > KIP-37
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-37+-+Add+Namespaces+to+Kafka
> > > >
> > > tracks the proposal.
> > >
> > > The idea is to make Kafka support multi-tenancy via namespaces.
> > >
> > > Feedback and comments are welcome.
> > > ​
> > > --
> > >
> > > Regards,
> > > Ashish
> > >
> >
>
>
>
> --
>
> Regards,
> Ashish
>


Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-12 Thread Ashish Singh
Hey Jay, thanks for reviewing the proposal. Answers inline.

On Mon, Oct 12, 2015 at 10:53 AM, Jay Kreps  wrote:

> Hey guys,
>
> I think this is an important feature and one we've talked about for a
> while. I really think trying to invent a new nomenclature is going to make
> it hard for people to understand, though. As such I recommend we call
> namespaces "directories" and denote them with '/'--this will make the
> feature 1000x more understandable to people.

Essentially you are suggesting two things here.
1. Use "Directory" instead of "Namespace" as it is more intuitive. I agree.
2. Make '/' as delimiter instead of ':'. Fine with me and I agree if we
call these directories, '/' is the way to go.

I think we should inheret the
> semantics of normal unix fs in so far as it makes sense.
>
> In this approach we get rid of topics entirely, instead we really just have
> partitions which are the equivalent of a file and retain their numeric
> names, and the existing topic concept is just the first directory level but
> we generalize to allow arbitrarily many more levels of nesting. This allows
> categorization of data, such as /datacenter1/user-events/page-views/3 and
> you can subscribe, apply configs or permissions at any level of the
> hierarchy.
>
+1. This actually requires just a minor change to existing proposal, i.e.,
"some:namespace:topic" becomes "some/namespace/topic".

>
> I'm actually not 100% such what the semantics of accessing data in
> differing namespaces is in the current proposal, maybe you can clarify
> Ashish?

I will add more info to KIP on this, however I think a client should be
able to access data in any namespace as long as following conditions are
satisfied.

1. Namespace, the client is trying to access, exists.
2. The client has sufficient permissions on the namespace for type of
operation the client is trying to perform on a topic within that namespace.
3. The client has sufficient permissions on the topic for type of operation
the client is trying to perform on that topic.

If we choose to go with what you suggested earlier that just have hierarchy
of directories, then step 3 will actually be covered in step 2.

In the current proposal, consumers will subscribe to a topic in a namespace
by specifying : as the topic name. They can subscribe to
topics from multiple namespaces.

Let me know if I totally missed your question.

Since the point of Kafka is sharing data I think it is really
> important that the grouping be just for convenience/permissions/config/etc
> and that it remain possible to access multiple directories/namespaces from
> the same client.
>
Totally agree with you.

>
> -Jay
>
> On Fri, Oct 9, 2015 at 6:32 PM, Ashish Singh  wrote:
>
> > Hey Guys,
> >
> > I just created KIP-37 for adding namespaces to Kafka.
> >
> > KIP-37
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-37+-+Add+Namespaces+to+Kafka
> > >
> > tracks the proposal.
> >
> > The idea is to make Kafka support multi-tenancy via namespaces.
> >
> > Feedback and comments are welcome.
> > ​
> > --
> >
> > Regards,
> > Ashish
> >
>



-- 

Regards,
Ashish


Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-12 Thread Ashish Singh
Rajini, thanks for the review. Answers inline.

On Mon, Oct 12, 2015 at 8:01 AM, Rajini Sivaram <
rajinisiva...@googlemail.com> wrote:

> Ashish,
>
> Thank you for doing this writeup and starting the discussion around
> namespaces and multi-tenancy.
>
> We have implemented a namespace solution on top of Kafka trunk and our
> motivation to do so matches your description in KIP-37 to a large extent.
> But there are some differences in our approach. The main difference is that
> we are not exposing namespaces to clients. Our goal is to provide a
> multi-tenant interface where each tenant has a view of Kafka that looks and
> feels like a single-user Kafka cluster. This is a quick summary of our
> approach. We are waiting for the SASL implementation in Kafka to complete
> this work and will be happy to share the code once we have integrated with
> SASL.
>
 As Jay later suggested, and I agree, clients should be able to talk to any
namespace as long as they have the permission to access that namespace and
the namespace exists.

I think below is a client should be able to do.

1. Get a list of namespaces it has permissions to.
2. Get a list of topics in namespaces it has permissions to.
3. Produce/ Consume to any topic in any namespace as long as that namespace
exists, the client has permissions to the namespace and the topic.

I think as long as we satisfy the above mentioned reqs, your requirement
should by met. Let me know if I am missing something.

>
> *Current implementation:*
>
> *Goal*: Provide a secure multi-tenant Kafka service that works with
> out-of-the-box Kafka 0.9.0.0 clients. We have a requirement to isolate
> tenants so that tenants do not have any awareness of other tenants.
>
I think one the advantages of current proposal is that existing and
previous clients will also be able to work seamlessly with the proposed
changes.

>
> *Implementation*:
>
>- We use a modified version of kafka.server.KafkaApis.scala which
>enables a pluggable interceptor for requests and responses.
>- Our interceptor plugin adds a tenant-specific prefix to topics and
>consumer groups in requests and removes the prefix in responses. At the
>moment, we are using valid topic/group characters in prefixes so that no
>other changes are required in Kafka.
>- Tenants are identified and authenticated based on their security
>principal. We are using clientid to hold an API key temporarily while we
>wait for SASL implementation in Kafka.
>- External clients which connect using SSL+SASL only see the topics
>within their namespace. They have no access to Zookeeper.
>- Our internal clients use PLAINTEXT and do see all topics and also have
>access to Zookeeper.
>
> *Longer term goals:*
>
>- Even though we started our prototype with hierarchical topics, we
>decided against it for our initial release to avoid changes to clients.
> But
>for the longer term, it will definitely be useful to use hierarchical
>topics to provide namespaces. In particular, we would like the option to
>use different encryption keys to encrypt data at rest for different
>tenants. If/when hierarchical topics are supported in Kafka, we would
> use
>the first level of the hierarchy as a tenant prefix.
>- We are currently not using quotas for tenants, but we do have a
>requirement to support quotas.
>
> I think we can make your proposed solution work for us. But I wasn't clear
> on why namespaces would be a preferred approach to full-fledged
> hierarchical topics.

What do you mean by full-fledged hierarchical topics? I think full-fledged
hierarchical topics is a way to implement namespace. Let me know if I am
missing something.


> There is no mention of consumer groups which also
> requires namespaces to support multi-tenancy.

Not sure what you mean by namespaces for consumer groups. As long as
consumers in consumer group have permissions to interact with a namespace,
they should be able to work they work right now. Let me know if I am
missing something.

We prefer to have
> multi-tenant applications which aren't aware of their namespace as we have
> today, but if that is something specific to our environment and the general
> consensus is to make tenants namespace-aware, we may be able to work around
> it.
>
> Thoughts?
>
>
> Regards,
>
> Rajini
>
> On Sat, Oct 10, 2015 at 10:03 AM, Magnus Edenhill 
> wrote:
>
> > Good write-up Ashish.
> >
> > Looking at one of the rejected alternatives got me thinking:
> > "Modify request/ response formats to take namespace specifically.
> >
> > Solves the issue of delimiting string required in proposed approach and
> the
> > issue with existing regex consumers.
> > This definitely is the cleanest approach. However, will require lots of
> API
> > and protocol changes. This is listed in rejected alternatives, but we
> > should totally consider this as an option."
> >
> >
> > I think we could actually achieve this without any protocol change at

Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-12 Thread Ashish Singh
Magnus, thanks for taking a look at the proposal. Answers inline.

On Sat, Oct 10, 2015 at 2:03 AM, Magnus Edenhill  wrote:

> Good write-up Ashish.
>
> Looking at one of the rejected alternatives got me thinking:
> "Modify request/ response formats to take namespace specifically.
>
> Solves the issue of delimiting string required in proposed approach and the
> issue with existing regex consumers.
> This definitely is the cleanest approach. However, will require lots of API
> and protocol changes. This is listed in rejected alternatives, but we
> should totally consider this as an option."
>
>
> I think we could actually achieve this without any protocol change at all
> by moving the namespace token to the common request header's ClientId
> field.
> E.g.: .. RequestHeader { ..., ClientId = "myclient@mynamespace", ... }
>
> That would provide the desired per-request namespacing and in theory the
> existing clients dont even need to be updated as long as the clientId is
> configurable (which it should be).
>
Interesting approach.

Pros:
1. As you mentioned, it does not require a delimiting string.

Cons:
1. A client can only interact with one namespace.

As Jay later suggested, and I agree, clients should be able to talk to any
namespace as long as they have the permission to access that namespace and
the namespace exists. Makes sense?

>
>
> My two cents,
> Magnus
>
>
> 2015-10-10 3:32 GMT+02:00 Ashish Singh :
>
> > Hey Guys,
> >
> > I just created KIP-37 for adding namespaces to Kafka.
> >
> > KIP-37
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-37+-+Add+Namespaces+to+Kafka
> > >
> > tracks the proposal.
> >
> > The idea is to make Kafka support multi-tenancy via namespaces.
> >
> > Feedback and comments are welcome.
> > ​
> > --
> >
> > Regards,
> > Ashish
> >
>



-- 

Regards,
Ashish


Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-12 Thread Jay Kreps
Hey guys,

I think this is an important feature and one we've talked about for a
while. I really think trying to invent a new nomenclature is going to make
it hard for people to understand, though. As such I recommend we call
namespaces "directories" and denote them with '/'--this will make the
feature 1000x more understandable to people. I think we should inheret the
semantics of normal unix fs in so far as it makes sense.

In this approach we get rid of topics entirely, instead we really just have
partitions which are the equivalent of a file and retain their numeric
names, and the existing topic concept is just the first directory level but
we generalize to allow arbitrarily many more levels of nesting. This allows
categorization of data, such as /datacenter1/user-events/page-views/3 and
you can subscribe, apply configs or permissions at any level of the
hierarchy.

I'm actually not 100% such what the semantics of accessing data in
differing namespaces is in the current proposal, maybe you can clarify
Ashish? Since the point of Kafka is sharing data I think it is really
important that the grouping be just for convenience/permissions/config/etc
and that it remain possible to access multiple directories/namespaces from
the same client.

-Jay

On Fri, Oct 9, 2015 at 6:32 PM, Ashish Singh  wrote:

> Hey Guys,
>
> I just created KIP-37 for adding namespaces to Kafka.
>
> KIP-37
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-37+-+Add+Namespaces+to+Kafka
> >
> tracks the proposal.
>
> The idea is to make Kafka support multi-tenancy via namespaces.
>
> Feedback and comments are welcome.
> ​
> --
>
> Regards,
> Ashish
>


Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-12 Thread Rajini Sivaram
Ashish,

Thank you for doing this writeup and starting the discussion around
namespaces and multi-tenancy.

We have implemented a namespace solution on top of Kafka trunk and our
motivation to do so matches your description in KIP-37 to a large extent.
But there are some differences in our approach. The main difference is that
we are not exposing namespaces to clients. Our goal is to provide a
multi-tenant interface where each tenant has a view of Kafka that looks and
feels like a single-user Kafka cluster. This is a quick summary of our
approach. We are waiting for the SASL implementation in Kafka to complete
this work and will be happy to share the code once we have integrated with
SASL.

*Current implementation:*

*Goal*: Provide a secure multi-tenant Kafka service that works with
out-of-the-box Kafka 0.9.0.0 clients. We have a requirement to isolate
tenants so that tenants do not have any awareness of other tenants.

*Implementation*:

   - We use a modified version of kafka.server.KafkaApis.scala which
   enables a pluggable interceptor for requests and responses.
   - Our interceptor plugin adds a tenant-specific prefix to topics and
   consumer groups in requests and removes the prefix in responses. At the
   moment, we are using valid topic/group characters in prefixes so that no
   other changes are required in Kafka.
   - Tenants are identified and authenticated based on their security
   principal. We are using clientid to hold an API key temporarily while we
   wait for SASL implementation in Kafka.
   - External clients which connect using SSL+SASL only see the topics
   within their namespace. They have no access to Zookeeper.
   - Our internal clients use PLAINTEXT and do see all topics and also have
   access to Zookeeper.

*Longer term goals:*

   - Even though we started our prototype with hierarchical topics, we
   decided against it for our initial release to avoid changes to clients. But
   for the longer term, it will definitely be useful to use hierarchical
   topics to provide namespaces. In particular, we would like the option to
   use different encryption keys to encrypt data at rest for different
   tenants. If/when hierarchical topics are supported in Kafka, we would use
   the first level of the hierarchy as a tenant prefix.
   - We are currently not using quotas for tenants, but we do have a
   requirement to support quotas.

I think we can make your proposed solution work for us. But I wasn't clear
on why namespaces would be a preferred approach to full-fledged
hierarchical topics. There is no mention of consumer groups which also
requires namespaces to support multi-tenancy. We prefer to have
multi-tenant applications which aren't aware of their namespace as we have
today, but if that is something specific to our environment and the general
consensus is to make tenants namespace-aware, we may be able to work around
it.

Thoughts?


Regards,

Rajini

On Sat, Oct 10, 2015 at 10:03 AM, Magnus Edenhill 
wrote:

> Good write-up Ashish.
>
> Looking at one of the rejected alternatives got me thinking:
> "Modify request/ response formats to take namespace specifically.
>
> Solves the issue of delimiting string required in proposed approach and the
> issue with existing regex consumers.
> This definitely is the cleanest approach. However, will require lots of API
> and protocol changes. This is listed in rejected alternatives, but we
> should totally consider this as an option."
>
>
> I think we could actually achieve this without any protocol change at all
> by moving the namespace token to the common request header's ClientId
> field.
> E.g.: .. RequestHeader { ..., ClientId = "myclient@mynamespace", ... }
>
> That would provide the desired per-request namespacing and in theory the
> existing clients dont even need to be updated as long as the clientId is
> configurable (which it should be).
>
>
> My two cents,
> Magnus
>
>
> 2015-10-10 3:32 GMT+02:00 Ashish Singh :
>
> > Hey Guys,
> >
> > I just created KIP-37 for adding namespaces to Kafka.
> >
> > KIP-37
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-37+-+Add+Namespaces+to+Kafka
> > >
> > tracks the proposal.
> >
> > The idea is to make Kafka support multi-tenancy via namespaces.
> >
> > Feedback and comments are welcome.
> > ​
> > --
> >
> > Regards,
> > Ashish
> >
>


Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-10 Thread Magnus Edenhill
Good write-up Ashish.

Looking at one of the rejected alternatives got me thinking:
"Modify request/ response formats to take namespace specifically.

Solves the issue of delimiting string required in proposed approach and the
issue with existing regex consumers.
This definitely is the cleanest approach. However, will require lots of API
and protocol changes. This is listed in rejected alternatives, but we
should totally consider this as an option."


I think we could actually achieve this without any protocol change at all
by moving the namespace token to the common request header's ClientId field.
E.g.: .. RequestHeader { ..., ClientId = "myclient@mynamespace", ... }

That would provide the desired per-request namespacing and in theory the
existing clients dont even need to be updated as long as the clientId is
configurable (which it should be).


My two cents,
Magnus


2015-10-10 3:32 GMT+02:00 Ashish Singh :

> Hey Guys,
>
> I just created KIP-37 for adding namespaces to Kafka.
>
> KIP-37
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-37+-+Add+Namespaces+to+Kafka
> >
> tracks the proposal.
>
> The idea is to make Kafka support multi-tenancy via namespaces.
>
> Feedback and comments are welcome.
> ​
> --
>
> Regards,
> Ashish
>


[DISCUSS] KIP-37 - Add namespaces in Kafka

2015-10-09 Thread Ashish Singh
Hey Guys,

I just created KIP-37 for adding namespaces to Kafka.

KIP-37

tracks the proposal.

The idea is to make Kafka support multi-tenancy via namespaces.

Feedback and comments are welcome.
​
-- 

Regards,
Ashish