Re: Two open issues on Kafka security

2014-10-02 Thread Don Bosco Durai
I agree, username+IP would be sufficient. I assume, when authentication is 
turned off or doesn’t exist, but authorization plugin is enabled, then username 
would be empty or passed as “nobody”, but with valid IP (if available).

> The name “context" is probably not the right one. The idea is to have an
> object into which we can easily add additional properties in the future
> to support additional authorization libraries without breaking backward
> compatibility with existing ones.
+1. Makes the design scalable.

Thanks

Bosco


> 
> 
> - Original message -
> From: Jarek Jarcec Cecho 
> To: dev@kafka.apache.org
> Subject: Re: Two open issues on Kafka security
> Date: Thu, 2 Oct 2014 08:33:45 -0700
> 
> Thanks for getting back Jay!
> 
> For the interface - Looking at Sentry and other authorization libraries
> in the Hadoop eco system it seems that “username” is primarily use to
> perform authorization these days. And then IP for auditing. Hence I feel
> that username+IP would be sufficient, at least for now. However I would
> assume that in the future we might need more then just those two, so
> what about defining the API in a way that we can easily extend in the
> future, something like?
> 
> authorize(Context, Entity, Action), where
> 
> * Action - is the action that user is trying to do (read to topic, read
> from topic, create topic, …)
> * Entity - given entity that user is trying to perform that action on
> (topic, …)
> * Context - container with user/session information - user name, IP
> address or perhaps entire certificate as was suggested early on the
> email thread.
> 
> The name “context" is probably not the right one. The idea is to have an
> object into which we can easily add additional properties in the future
> to support additional authorization libraries without breaking backward
> compatibility with existing ones.
> 
> The hierarchy is interesting topic - I’m not familiar enough with Kafka
> internals so I can’t really talk about how much more complex it would
> be. I can speak about Sentry and the way we designed security model for
> Hive and Search where introducing the hierarchy wasn’t complex at all
> and actually lead to a cleaner model. The biggest user visible benefit
> is that you don’t have to deal with special rules such as “give READ
> privilege to user jarcec to ALL topics”. If you have a singleton parent
> entity (service or whatever name seems more accurate), you can easily
> say that you have the READ access on this root entity and then all
> topics will simply inherit that.
> 
> Jarcec
> 
> On Oct 1, 2014, at 9:33 PM, Jay Kreps  wrote:
> 
>> Hey Jarek,
>> 
>> I agree with the importance of separating authentication and
>> authorization. The question is what concept of identity is sufficient
>> to pass through to the authorization layer? Just a "user name"? Or
>> perhaps you also need the ip the request originated from? Whatever
>> these would be it would be nice to enumerate them so the authz portion
>> can be written in a way that ignores the authn part.
>> 
>> So if no one else proposes anything different maybe we can just say
>> user name + ip?
>> 
>> With respect to hierarchy, it would be nice to have topic hierarchies
>> but we don't have them now so seems overkill to try to think them
>> through wrt security now, right?
>> 
>> -Jay
>> 
>> 
>> 
>> On Wed, Oct 1, 2014 at 1:13 PM, Jarek Jarcec Cecho  wrote:
>>> I’m following the security proposal wiki page [1] and this discussion and I 
>>> would like to jump in with few points if I might :)  Let me start by saying 
>>> that I like the material and the discussion here, good work!
>>> 
>>> I was part of the team who originally designed and worked on Sentry and I 
>>> wanted to share few to see how it will resonate with people.  My first and 
>>> probably biggest point would be to separate authorization and 
>>> authentication as two separate systems. I believe that Jao has already 
>>> stressed that in the email thread, but I wanted to reiterate on that point. 
>>> In my experience users don’t care that much about how the user has been 
>>> authenticated if they trust that mechanism, what they care more about is 
>>> that the authorization model is consistent and behaves the same way. E.g. 
>>> if I configured that user jarcec can write into topic “logs”, he should be 
>>> able to do that no matter where the connection came from - whether he has 
>>> been authorized from Kerberos as he is directly exploring the data from his 
>>> computer, he is authorized through

Re: Two open issues on Kafka security

2014-10-02 Thread Jay Kreps
Hey Michael,

Cool. Yeah I think in practice there isn't a huge difference since
Kafka requests are just length prefixed packets the only difference is
the presence or absence of the header fields. Having them there will
make life simpler and more consistent for client implementations since
this will just be one more request type they can chose to implement
and will look like all the other request types.

So let's do that.

-Jay

On Thu, Oct 2, 2014 at 9:01 AM, Michael Herstine
 wrote:
> Hi Jay,
>
> Yup― in both SASL & (non-blocking) SSL the runtime libs provide an
> “engine” abstraction that just takes in & produces buffers of byte
> containing the authentication messages. The application is responsible for
> transmitting them… somehow. I was picturing a simple length-prefixed
> packet.
>
> Thanks for the pointer to the ZK code― I spent yesterday morning reading
> the server side & see how it’s being done (interesting side note: SASL is
> only used for Kerberos― other authentication schemes go through a
> different mechanism).
>
> I’m all for going with the original proposal & not introducing a second
> (albeit trivial) protocol… I was laboring under the impression that we
> wanted to avoid adding new request/response types, that’s all.
>
> On 10/1/14, 9:52 PM, "Jay Kreps"  wrote:
>
>>Here is the client side in ZK:
>>https://svn.apache.org/repos/asf/zookeeper/trunk/src/java/main/org/apache/
>>zookeeper/client/ZooKeeperSaslClient.java
>>
>>Note how they have a special Zookeeper request API that is used to
>>send the SASL bytes (e.g. see ZooKeeperSaslClient.sendSaslPacket).
>>
>>This API follows the same protocol and rpc mechanism all their other
>>request/response types follow but it just has a simple byte[] entry
>>for the SASL token in both the request and response.
>>
>>-Jay
>>
>>On Wed, Oct 1, 2014 at 9:46 PM, Jay Kreps  wrote:
>>> Hey Michael,
>>>
>>> WRT question 2, I think for SASL you do need the mechanism information
>>> but what I was talking about was the challenge/response byte[] that is
>>> sent back and forth from the client to the server. My understanding is
>>> that SASL gives you an api for the client and server to use to produce
>>> these byte[]'s but doesn't actually specify any way of exchanging them
>>> (that is protocol specific). I could be wrong here since my knowledge
>>> of this stuff is pretty weak. But according to my understanding you
>>> must be imagining some protocol for exchanging challenge/response
>>> information. This protocol would have to be clearly documented for
>>> client implementors. What is that protocol?
>>>
>>> -Jay
>>>
>>> On Wed, Oct 1, 2014 at 2:36 PM, Michael Herstine
>>>  wrote:
 Regarding question #1, I’m not sure I follow you, Joe: you’re
proposing (I
 think) that the API take a byte[], but what will be in that array? A
 serialized certificate if the client authenticated via SSL and the
 principal name (perhaps normalized) if the client authenticated via
 Kerberos?

 Regarding question #2, I think I was unclear in the meeting yesterday:
I
 was proposing a separate port for each authentication method (including
 none). That is, if a client wants no authentication, then they would
 connect to port N on the broker. If they wanted to talk over SSL, then
 they connect to port N+1 (say). Kerberos: N+2. This would remove the
need
 for a new request, since the authentication type would be implicit in
the
 port on which the client connected (and it was my understanding that it
 was desirable to not introduce any new messages).

 Perhaps the confusion comes from the fact, correctly pointed out by
Jay,
 that when you want to use SASL on a single port, there does of course
need
 to be a way for the incoming client to signal which mechanism it wants
to
 use, and that’s out of scope of the SASL spec. I didn’t see there
being a
 desire to add new SASL mechanisms going forward, but perhaps I was
 incorrect?

 In any event, I’d like to suggest we keep the “open” or “no auth” port
 separate, both to make it easy for admins to force the use of security
(by
 shutting down that port) and to avoid downgrade attacks (where an
attacker
 intercepts the opening packet from a client requesting security &
alters
 it to request none).

 I’ll update the Wiki with my notes from yesterday’s meeting this
afternoon.

 Thanks,

 On 10/1/14, 9:35 AM, "Jonathan Creasy" 
wrote:

>This is not nearly as deep as the discussion so far, but I did want to
>throw this idea out there to make sure we¹ve thought about it.
>
>The Kafka project should make sure that when deployed alongside a
>Hadoop
>cluster from any major distributions that it can tie seamlessly into
>the
>authentication and authorization used within that cluster. For example,
>Apache Sentry.
>
>This may pre

Re: Two open issues on Kafka security

2014-10-02 Thread Michael Herstine
Hi Jay,

Yup― in both SASL & (non-blocking) SSL the runtime libs provide an
“engine” abstraction that just takes in & produces buffers of byte
containing the authentication messages. The application is responsible for
transmitting them… somehow. I was picturing a simple length-prefixed
packet.

Thanks for the pointer to the ZK code― I spent yesterday morning reading
the server side & see how it’s being done (interesting side note: SASL is
only used for Kerberos― other authentication schemes go through a
different mechanism).

I’m all for going with the original proposal & not introducing a second
(albeit trivial) protocol… I was laboring under the impression that we
wanted to avoid adding new request/response types, that’s all.

On 10/1/14, 9:52 PM, "Jay Kreps"  wrote:

>Here is the client side in ZK:
>https://svn.apache.org/repos/asf/zookeeper/trunk/src/java/main/org/apache/
>zookeeper/client/ZooKeeperSaslClient.java
>
>Note how they have a special Zookeeper request API that is used to
>send the SASL bytes (e.g. see ZooKeeperSaslClient.sendSaslPacket).
>
>This API follows the same protocol and rpc mechanism all their other
>request/response types follow but it just has a simple byte[] entry
>for the SASL token in both the request and response.
>
>-Jay
>
>On Wed, Oct 1, 2014 at 9:46 PM, Jay Kreps  wrote:
>> Hey Michael,
>>
>> WRT question 2, I think for SASL you do need the mechanism information
>> but what I was talking about was the challenge/response byte[] that is
>> sent back and forth from the client to the server. My understanding is
>> that SASL gives you an api for the client and server to use to produce
>> these byte[]'s but doesn't actually specify any way of exchanging them
>> (that is protocol specific). I could be wrong here since my knowledge
>> of this stuff is pretty weak. But according to my understanding you
>> must be imagining some protocol for exchanging challenge/response
>> information. This protocol would have to be clearly documented for
>> client implementors. What is that protocol?
>>
>> -Jay
>>
>> On Wed, Oct 1, 2014 at 2:36 PM, Michael Herstine
>>  wrote:
>>> Regarding question #1, I’m not sure I follow you, Joe: you’re
>>>proposing (I
>>> think) that the API take a byte[], but what will be in that array? A
>>> serialized certificate if the client authenticated via SSL and the
>>> principal name (perhaps normalized) if the client authenticated via
>>> Kerberos?
>>>
>>> Regarding question #2, I think I was unclear in the meeting yesterday:
>>>I
>>> was proposing a separate port for each authentication method (including
>>> none). That is, if a client wants no authentication, then they would
>>> connect to port N on the broker. If they wanted to talk over SSL, then
>>> they connect to port N+1 (say). Kerberos: N+2. This would remove the
>>>need
>>> for a new request, since the authentication type would be implicit in
>>>the
>>> port on which the client connected (and it was my understanding that it
>>> was desirable to not introduce any new messages).
>>>
>>> Perhaps the confusion comes from the fact, correctly pointed out by
>>>Jay,
>>> that when you want to use SASL on a single port, there does of course
>>>need
>>> to be a way for the incoming client to signal which mechanism it wants
>>>to
>>> use, and that’s out of scope of the SASL spec. I didn’t see there
>>>being a
>>> desire to add new SASL mechanisms going forward, but perhaps I was
>>> incorrect?
>>>
>>> In any event, I’d like to suggest we keep the “open” or “no auth” port
>>> separate, both to make it easy for admins to force the use of security
>>>(by
>>> shutting down that port) and to avoid downgrade attacks (where an
>>>attacker
>>> intercepts the opening packet from a client requesting security &
>>>alters
>>> it to request none).
>>>
>>> I’ll update the Wiki with my notes from yesterday’s meeting this
>>>afternoon.
>>>
>>> Thanks,
>>>
>>> On 10/1/14, 9:35 AM, "Jonathan Creasy" 
>>>wrote:
>>>
This is not nearly as deep as the discussion so far, but I did want to
throw this idea out there to make sure we¹ve thought about it.

The Kafka project should make sure that when deployed alongside a
Hadoop
cluster from any major distributions that it can tie seamlessly into
the
authentication and authorization used within that cluster. For example,
Apache Sentry.

This may present additional difficulties that means a decision is made
to
not do that or alternatively the Kerberos authentication and the
authorization schemes we are already working on may be sufficient.

I¹m not sure that anything I¹ve read so far in this discussion actually
poses a problem, but I¹m an Ops guy and being able to more easily
integrate more things, makes my life better. :)

-Jonathan

On 9/30/14, 11:26 PM, "Joe Stein"  wrote:

>inline
>
>On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps 
>wrote:
>
>> Hey Joe,
>>
>> For (1) what 

Re: Two open issues on Kafka security

2014-10-02 Thread Jarek Jarcec Cecho
Thanks for getting back Jay!

For the interface - Looking at Sentry and other authorization libraries in the 
Hadoop eco system it seems that “username” is primarily use to perform 
authorization these days. And then IP for auditing. Hence I feel that 
username+IP would be sufficient, at least for now. However I would assume that 
in the future we might need more then just those two, so what about defining 
the API in a way that we can easily extend in the future, something like?

authorize(Context, Entity, Action), where

* Action - is the action that user is trying to do (read to topic, read from 
topic, create topic, …)
* Entity - given entity that user is trying to perform that action on (topic, …)
* Context - container with user/session information - user name, IP address or 
perhaps entire certificate as was suggested early on the email thread.

The name “context" is probably not the right one. The idea is to have an object 
into which we can easily add additional properties in the future to support 
additional authorization libraries without breaking backward compatibility with 
existing ones.

The hierarchy is interesting topic - I’m not familiar enough with Kafka 
internals so I can’t really talk about how much more complex it would be. I can 
speak about Sentry and the way we designed security model for Hive and Search 
where introducing the hierarchy wasn’t complex at all and actually lead to a 
cleaner model. The biggest user visible benefit is that you don’t have to deal 
with special rules such as “give READ privilege to user jarcec to ALL topics”. 
If you have a singleton parent entity (service or whatever name seems more 
accurate), you can easily say that you have the READ access on this root entity 
and then all topics will simply inherit that.

Jarcec

On Oct 1, 2014, at 9:33 PM, Jay Kreps  wrote:

> Hey Jarek,
> 
> I agree with the importance of separating authentication and
> authorization. The question is what concept of identity is sufficient
> to pass through to the authorization layer? Just a "user name"? Or
> perhaps you also need the ip the request originated from? Whatever
> these would be it would be nice to enumerate them so the authz portion
> can be written in a way that ignores the authn part.
> 
> So if no one else proposes anything different maybe we can just say
> user name + ip?
> 
> With respect to hierarchy, it would be nice to have topic hierarchies
> but we don't have them now so seems overkill to try to think them
> through wrt security now, right?
> 
> -Jay
> 
> 
> 
> On Wed, Oct 1, 2014 at 1:13 PM, Jarek Jarcec Cecho  wrote:
>> I’m following the security proposal wiki page [1] and this discussion and I 
>> would like to jump in with few points if I might :)  Let me start by saying 
>> that I like the material and the discussion here, good work!
>> 
>> I was part of the team who originally designed and worked on Sentry and I 
>> wanted to share few to see how it will resonate with people.  My first and 
>> probably biggest point would be to separate authorization and authentication 
>> as two separate systems. I believe that Jao has already stressed that in the 
>> email thread, but I wanted to reiterate on that point. In my experience 
>> users don’t care that much about how the user has been authenticated if they 
>> trust that mechanism, what they care more about is that the authorization 
>> model is consistent and behaves the same way. E.g. if I configured that user 
>> jarcec can write into topic “logs”, he should be able to do that no matter 
>> where the connection came from - whether he has been authorized from 
>> Kerberos as he is directly exploring the data from his computer, he is 
>> authorized through delegation token because he is running map reduce jobs 
>> calculating statistics or he is  authorized through SSL certificated because 
>> … (well I’m missing good example here, but you’re probably following my 
>> point).
>> 
>> I’ve also noticed that we are planning to have no hierarchy in the authz 
>> object model per the wiki [1] with the reasoning that Kafka do not supports 
>> topic hierarchy. I see that point, but at the same time it got me thinking - 
>> are we sure that Kafka will never have hierarchic topics? Seems as a nice 
>> feature that might be usable for some use cases and something that we might 
>> want to add in the future. But regardless of that I would suggest to 
>> introduce a hierarchy anyway, even though if it would be just two levels. In 
>> sentry (for Hive) we’ve introduced concept of “Service” where all the 
>> databases are children of the service. In Kafka I would imagine that we 
>> would have “service” and “topics” as the children. Having this is much 
>> easier to model general privileges where you need to grant access to all 
>> topics - you will just grant access to the entire service and all topics 
>> will get “inherited”.
>> 
>> I’m wondering what are other people thoughts?
>> 
>> Jarcec
>> 
>> Links:
>> 1: ht

Re: Two open issues on Kafka security

2014-10-01 Thread Jay Kreps
Here is the client side in ZK:
https://svn.apache.org/repos/asf/zookeeper/trunk/src/java/main/org/apache/zookeeper/client/ZooKeeperSaslClient.java

Note how they have a special Zookeeper request API that is used to
send the SASL bytes (e.g. see ZooKeeperSaslClient.sendSaslPacket).

This API follows the same protocol and rpc mechanism all their other
request/response types follow but it just has a simple byte[] entry
for the SASL token in both the request and response.

-Jay

On Wed, Oct 1, 2014 at 9:46 PM, Jay Kreps  wrote:
> Hey Michael,
>
> WRT question 2, I think for SASL you do need the mechanism information
> but what I was talking about was the challenge/response byte[] that is
> sent back and forth from the client to the server. My understanding is
> that SASL gives you an api for the client and server to use to produce
> these byte[]'s but doesn't actually specify any way of exchanging them
> (that is protocol specific). I could be wrong here since my knowledge
> of this stuff is pretty weak. But according to my understanding you
> must be imagining some protocol for exchanging challenge/response
> information. This protocol would have to be clearly documented for
> client implementors. What is that protocol?
>
> -Jay
>
> On Wed, Oct 1, 2014 at 2:36 PM, Michael Herstine
>  wrote:
>> Regarding question #1, I’m not sure I follow you, Joe: you’re proposing (I
>> think) that the API take a byte[], but what will be in that array? A
>> serialized certificate if the client authenticated via SSL and the
>> principal name (perhaps normalized) if the client authenticated via
>> Kerberos?
>>
>> Regarding question #2, I think I was unclear in the meeting yesterday: I
>> was proposing a separate port for each authentication method (including
>> none). That is, if a client wants no authentication, then they would
>> connect to port N on the broker. If they wanted to talk over SSL, then
>> they connect to port N+1 (say). Kerberos: N+2. This would remove the need
>> for a new request, since the authentication type would be implicit in the
>> port on which the client connected (and it was my understanding that it
>> was desirable to not introduce any new messages).
>>
>> Perhaps the confusion comes from the fact, correctly pointed out by Jay,
>> that when you want to use SASL on a single port, there does of course need
>> to be a way for the incoming client to signal which mechanism it wants to
>> use, and that’s out of scope of the SASL spec. I didn’t see there being a
>> desire to add new SASL mechanisms going forward, but perhaps I was
>> incorrect?
>>
>> In any event, I’d like to suggest we keep the “open” or “no auth” port
>> separate, both to make it easy for admins to force the use of security (by
>> shutting down that port) and to avoid downgrade attacks (where an attacker
>> intercepts the opening packet from a client requesting security & alters
>> it to request none).
>>
>> I’ll update the Wiki with my notes from yesterday’s meeting this afternoon.
>>
>> Thanks,
>>
>> On 10/1/14, 9:35 AM, "Jonathan Creasy"  wrote:
>>
>>>This is not nearly as deep as the discussion so far, but I did want to
>>>throw this idea out there to make sure we¹ve thought about it.
>>>
>>>The Kafka project should make sure that when deployed alongside a Hadoop
>>>cluster from any major distributions that it can tie seamlessly into the
>>>authentication and authorization used within that cluster. For example,
>>>Apache Sentry.
>>>
>>>This may present additional difficulties that means a decision is made to
>>>not do that or alternatively the Kerberos authentication and the
>>>authorization schemes we are already working on may be sufficient.
>>>
>>>I¹m not sure that anything I¹ve read so far in this discussion actually
>>>poses a problem, but I¹m an Ops guy and being able to more easily
>>>integrate more things, makes my life better. :)
>>>
>>>-Jonathan
>>>
>>>On 9/30/14, 11:26 PM, "Joe Stein"  wrote:
>>>
inline

On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps  wrote:

> Hey Joe,
>
> For (1) what are you thinking for the PermissionManager api?
>
> The way I see it, the first question we have to answer is whether it
> is possible to make authentication and authorization independent. What
> I mean by that is whether I can write an authorization library that
> will work the same whether you authenticate with ssl or kerberos.


To me that is a requirement. We can't tie them together.  We have to
provide the ability for authorization to work regardless of the
authentication.  One *VERY* important use case is level of trust in
authentication from the authorization perpsective.  e.g. I authorize
"identity" based on the how you authenticated Alice is able to view
topic X if Alice authenticated over kerberos.  Bob isn't allowed to view
topic X no matter what. Alice can authenticate over not kerberos (uses
cases for that) and in that case Alice wouldn't se

Re: Two open issues on Kafka security

2014-10-01 Thread Jay Kreps
Hey Michael,

WRT question 2, I think for SASL you do need the mechanism information
but what I was talking about was the challenge/response byte[] that is
sent back and forth from the client to the server. My understanding is
that SASL gives you an api for the client and server to use to produce
these byte[]'s but doesn't actually specify any way of exchanging them
(that is protocol specific). I could be wrong here since my knowledge
of this stuff is pretty weak. But according to my understanding you
must be imagining some protocol for exchanging challenge/response
information. This protocol would have to be clearly documented for
client implementors. What is that protocol?

-Jay

On Wed, Oct 1, 2014 at 2:36 PM, Michael Herstine
 wrote:
> Regarding question #1, I’m not sure I follow you, Joe: you’re proposing (I
> think) that the API take a byte[], but what will be in that array? A
> serialized certificate if the client authenticated via SSL and the
> principal name (perhaps normalized) if the client authenticated via
> Kerberos?
>
> Regarding question #2, I think I was unclear in the meeting yesterday: I
> was proposing a separate port for each authentication method (including
> none). That is, if a client wants no authentication, then they would
> connect to port N on the broker. If they wanted to talk over SSL, then
> they connect to port N+1 (say). Kerberos: N+2. This would remove the need
> for a new request, since the authentication type would be implicit in the
> port on which the client connected (and it was my understanding that it
> was desirable to not introduce any new messages).
>
> Perhaps the confusion comes from the fact, correctly pointed out by Jay,
> that when you want to use SASL on a single port, there does of course need
> to be a way for the incoming client to signal which mechanism it wants to
> use, and that’s out of scope of the SASL spec. I didn’t see there being a
> desire to add new SASL mechanisms going forward, but perhaps I was
> incorrect?
>
> In any event, I’d like to suggest we keep the “open” or “no auth” port
> separate, both to make it easy for admins to force the use of security (by
> shutting down that port) and to avoid downgrade attacks (where an attacker
> intercepts the opening packet from a client requesting security & alters
> it to request none).
>
> I’ll update the Wiki with my notes from yesterday’s meeting this afternoon.
>
> Thanks,
>
> On 10/1/14, 9:35 AM, "Jonathan Creasy"  wrote:
>
>>This is not nearly as deep as the discussion so far, but I did want to
>>throw this idea out there to make sure we¹ve thought about it.
>>
>>The Kafka project should make sure that when deployed alongside a Hadoop
>>cluster from any major distributions that it can tie seamlessly into the
>>authentication and authorization used within that cluster. For example,
>>Apache Sentry.
>>
>>This may present additional difficulties that means a decision is made to
>>not do that or alternatively the Kerberos authentication and the
>>authorization schemes we are already working on may be sufficient.
>>
>>I¹m not sure that anything I¹ve read so far in this discussion actually
>>poses a problem, but I¹m an Ops guy and being able to more easily
>>integrate more things, makes my life better. :)
>>
>>-Jonathan
>>
>>On 9/30/14, 11:26 PM, "Joe Stein"  wrote:
>>
>>>inline
>>>
>>>On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps  wrote:
>>>
 Hey Joe,

 For (1) what are you thinking for the PermissionManager api?

 The way I see it, the first question we have to answer is whether it
 is possible to make authentication and authorization independent. What
 I mean by that is whether I can write an authorization library that
 will work the same whether you authenticate with ssl or kerberos.
>>>
>>>
>>>To me that is a requirement. We can't tie them together.  We have to
>>>provide the ability for authorization to work regardless of the
>>>authentication.  One *VERY* important use case is level of trust in
>>>authentication from the authorization perpsective.  e.g. I authorize
>>>"identity" based on the how you authenticated Alice is able to view
>>>topic X if Alice authenticated over kerberos.  Bob isn't allowed to view
>>>topic X no matter what. Alice can authenticate over not kerberos (uses
>>>cases for that) and in that case Alice wouldn't see topic X.  A concrete
>>>use case for this with Kafka would be a third party bank consuming data
>>>to
>>>a broker.  The service provider would have some kerberos local auth for
>>>that bank to-do back up that would also have access to other topics
>>>related
>>>to that banks data the bank itself over SSL wants a stream of events
>>>(some specific topic) and that banks identity only sees that topic.  It
>>>is
>>>important to not confuse identity, authentication and authorization.
>>>
>>>
 If
 so then we need to pick some subset of identity information that we
 can extract from both and have this constitute the id

Re: Two open issues on Kafka security

2014-10-01 Thread Jay Kreps
Hey Jarek,

I agree with the importance of separating authentication and
authorization. The question is what concept of identity is sufficient
to pass through to the authorization layer? Just a "user name"? Or
perhaps you also need the ip the request originated from? Whatever
these would be it would be nice to enumerate them so the authz portion
can be written in a way that ignores the authn part.

So if no one else proposes anything different maybe we can just say
user name + ip?

With respect to hierarchy, it would be nice to have topic hierarchies
but we don't have them now so seems overkill to try to think them
through wrt security now, right?

-Jay



On Wed, Oct 1, 2014 at 1:13 PM, Jarek Jarcec Cecho  wrote:
> I’m following the security proposal wiki page [1] and this discussion and I 
> would like to jump in with few points if I might :)  Let me start by saying 
> that I like the material and the discussion here, good work!
>
> I was part of the team who originally designed and worked on Sentry and I 
> wanted to share few to see how it will resonate with people.  My first and 
> probably biggest point would be to separate authorization and authentication 
> as two separate systems. I believe that Jao has already stressed that in the 
> email thread, but I wanted to reiterate on that point. In my experience users 
> don’t care that much about how the user has been authenticated if they trust 
> that mechanism, what they care more about is that the authorization model is 
> consistent and behaves the same way. E.g. if I configured that user jarcec 
> can write into topic “logs”, he should be able to do that no matter where the 
> connection came from - whether he has been authorized from Kerberos as he is 
> directly exploring the data from his computer, he is authorized through 
> delegation token because he is running map reduce jobs calculating statistics 
> or he is  authorized through SSL certificated because … (well I’m missing 
> good example here, but you’re probably following my point).
>
> I’ve also noticed that we are planning to have no hierarchy in the authz 
> object model per the wiki [1] with the reasoning that Kafka do not supports 
> topic hierarchy. I see that point, but at the same time it got me thinking - 
> are we sure that Kafka will never have hierarchic topics? Seems as a nice 
> feature that might be usable for some use cases and something that we might 
> want to add in the future. But regardless of that I would suggest to 
> introduce a hierarchy anyway, even though if it would be just two levels. In 
> sentry (for Hive) we’ve introduced concept of “Service” where all the 
> databases are children of the service. In Kafka I would imagine that we would 
> have “service” and “topics” as the children. Having this is much easier to 
> model general privileges where you need to grant access to all topics - you 
> will just grant access to the entire service and all topics will get 
> “inherited”.
>
> I’m wondering what are other people thoughts?
>
> Jarcec
>
> Links:
> 1: https://cwiki.apache.org/confluence/display/KAFKA/Security
>
> On Oct 1, 2014, at 9:44 AM, Joe Stein  wrote:
>
>> Hi Jonathan,
>>
>> "Hadoop delegation tokens to enable MapReduce, Samza, or other frameworks
>> running in the Hadoop environment to access Kafka"
>> https://cwiki.apache.org/confluence/display/KAFKA/Security is on the list,
>> yup!
>>
>> /***
>> Joe Stein
>> Founder, Principal Consultant
>> Big Data Open Source Security LLC
>> http://www.stealth.ly
>> Twitter: @allthingshadoop 
>> /
>>
>> On Wed, Oct 1, 2014 at 12:35 PM, Jonathan Creasy 
>> wrote:
>>
>>> This is not nearly as deep as the discussion so far, but I did want to
>>> throw this idea out there to make sure we¹ve thought about it.
>>>
>>> The Kafka project should make sure that when deployed alongside a Hadoop
>>> cluster from any major distributions that it can tie seamlessly into the
>>> authentication and authorization used within that cluster. For example,
>>> Apache Sentry.
>>>
>>> This may present additional difficulties that means a decision is made to
>>> not do that or alternatively the Kerberos authentication and the
>>> authorization schemes we are already working on may be sufficient.
>>>
>>> I¹m not sure that anything I¹ve read so far in this discussion actually
>>> poses a problem, but I¹m an Ops guy and being able to more easily
>>> integrate more things, makes my life better. :)
>>>
>>> -Jonathan
>>>
>>> On 9/30/14, 11:26 PM, "Joe Stein"  wrote:
>>>
 inline

 On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps  wrote:

> Hey Joe,
>
> For (1) what are you thinking for the PermissionManager api?
>
> The way I see it, the first question we have to answer is whether it
> is possible to make authentication and authorization independent. What
> I mean by that is whether 

Re: Two open issues on Kafka security

2014-10-01 Thread Michael Herstine
Regarding question #1, I’m not sure I follow you, Joe: you’re proposing (I
think) that the API take a byte[], but what will be in that array? A
serialized certificate if the client authenticated via SSL and the
principal name (perhaps normalized) if the client authenticated via
Kerberos?

Regarding question #2, I think I was unclear in the meeting yesterday: I
was proposing a separate port for each authentication method (including
none). That is, if a client wants no authentication, then they would
connect to port N on the broker. If they wanted to talk over SSL, then
they connect to port N+1 (say). Kerberos: N+2. This would remove the need
for a new request, since the authentication type would be implicit in the
port on which the client connected (and it was my understanding that it
was desirable to not introduce any new messages).

Perhaps the confusion comes from the fact, correctly pointed out by Jay,
that when you want to use SASL on a single port, there does of course need
to be a way for the incoming client to signal which mechanism it wants to
use, and that’s out of scope of the SASL spec. I didn’t see there being a
desire to add new SASL mechanisms going forward, but perhaps I was
incorrect?

In any event, I’d like to suggest we keep the “open” or “no auth” port
separate, both to make it easy for admins to force the use of security (by
shutting down that port) and to avoid downgrade attacks (where an attacker
intercepts the opening packet from a client requesting security & alters
it to request none).

I’ll update the Wiki with my notes from yesterday’s meeting this afternoon.

Thanks,

On 10/1/14, 9:35 AM, "Jonathan Creasy"  wrote:

>This is not nearly as deep as the discussion so far, but I did want to
>throw this idea out there to make sure we¹ve thought about it.
>
>The Kafka project should make sure that when deployed alongside a Hadoop
>cluster from any major distributions that it can tie seamlessly into the
>authentication and authorization used within that cluster. For example,
>Apache Sentry.
>
>This may present additional difficulties that means a decision is made to
>not do that or alternatively the Kerberos authentication and the
>authorization schemes we are already working on may be sufficient.
>
>I¹m not sure that anything I¹ve read so far in this discussion actually
>poses a problem, but I¹m an Ops guy and being able to more easily
>integrate more things, makes my life better. :)
>
>-Jonathan
>
>On 9/30/14, 11:26 PM, "Joe Stein"  wrote:
>
>>inline
>>
>>On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps  wrote:
>>
>>> Hey Joe,
>>>
>>> For (1) what are you thinking for the PermissionManager api?
>>>
>>> The way I see it, the first question we have to answer is whether it
>>> is possible to make authentication and authorization independent. What
>>> I mean by that is whether I can write an authorization library that
>>> will work the same whether you authenticate with ssl or kerberos.
>>
>>
>>To me that is a requirement. We can't tie them together.  We have to
>>provide the ability for authorization to work regardless of the
>>authentication.  One *VERY* important use case is level of trust in
>>authentication from the authorization perpsective.  e.g. I authorize
>>"identity" based on the how you authenticated Alice is able to view
>>topic X if Alice authenticated over kerberos.  Bob isn't allowed to view
>>topic X no matter what. Alice can authenticate over not kerberos (uses
>>cases for that) and in that case Alice wouldn't see topic X.  A concrete
>>use case for this with Kafka would be a third party bank consuming data
>>to
>>a broker.  The service provider would have some kerberos local auth for
>>that bank to-do back up that would also have access to other topics
>>related
>>to that banks data the bank itself over SSL wants a stream of events
>>(some specific topic) and that banks identity only sees that topic.  It
>>is
>>important to not confuse identity, authentication and authorization.
>>
>>
>>> If
>>> so then we need to pick some subset of identity information that we
>>> can extract from both and have this constitute the identity we pass
>>> into the authorization interface. The original proposal had just the
>>> username/subject. But maybe we should add the ip address as well as
>>> that is useful. What I would prefer not to do is add everything in the
>>> certificate. I think the assumption is that you are generating these
>>> certificates for Kafka so you can put whatever identity info you want
>>> in the Subject Alternative Name. If that is true then just using that
>>> should be okay, right?
>>>
>>
>>I think we should just push the byte[] and let the plugin deal with it.
>>So, if we have a certificate object then pass that along with whatever
>>other meta data (e.g. IP address of client) we can.  I don't think we
>>should do any parsing whatsover and let the plugin deal with that.  Any
>>parsing we do on the identity information for the "security object"
>>forces
>>u

Re: Two open issues on Kafka security

2014-10-01 Thread Jarek Jarcec Cecho
I’m following the security proposal wiki page [1] and this discussion and I 
would like to jump in with few points if I might :)  Let me start by saying 
that I like the material and the discussion here, good work!

I was part of the team who originally designed and worked on Sentry and I 
wanted to share few to see how it will resonate with people.  My first and 
probably biggest point would be to separate authorization and authentication as 
two separate systems. I believe that Jao has already stressed that in the email 
thread, but I wanted to reiterate on that point. In my experience users don’t 
care that much about how the user has been authenticated if they trust that 
mechanism, what they care more about is that the authorization model is 
consistent and behaves the same way. E.g. if I configured that user jarcec can 
write into topic “logs”, he should be able to do that no matter where the 
connection came from - whether he has been authorized from Kerberos as he is 
directly exploring the data from his computer, he is authorized through 
delegation token because he is running map reduce jobs calculating statistics 
or he is  authorized through SSL certificated because … (well I’m missing good 
example here, but you’re probably following my point).

I’ve also noticed that we are planning to have no hierarchy in the authz object 
model per the wiki [1] with the reasoning that Kafka do not supports topic 
hierarchy. I see that point, but at the same time it got me thinking - are we 
sure that Kafka will never have hierarchic topics? Seems as a nice feature that 
might be usable for some use cases and something that we might want to add in 
the future. But regardless of that I would suggest to introduce a hierarchy 
anyway, even though if it would be just two levels. In sentry (for Hive) we’ve 
introduced concept of “Service” where all the databases are children of the 
service. In Kafka I would imagine that we would have “service” and “topics” as 
the children. Having this is much easier to model general privileges where you 
need to grant access to all topics - you will just grant access to the entire 
service and all topics will get “inherited”.

I’m wondering what are other people thoughts?

Jarcec

Links:
1: https://cwiki.apache.org/confluence/display/KAFKA/Security

On Oct 1, 2014, at 9:44 AM, Joe Stein  wrote:

> Hi Jonathan,
> 
> "Hadoop delegation tokens to enable MapReduce, Samza, or other frameworks
> running in the Hadoop environment to access Kafka"
> https://cwiki.apache.org/confluence/display/KAFKA/Security is on the list,
> yup!
> 
> /***
> Joe Stein
> Founder, Principal Consultant
> Big Data Open Source Security LLC
> http://www.stealth.ly
> Twitter: @allthingshadoop 
> /
> 
> On Wed, Oct 1, 2014 at 12:35 PM, Jonathan Creasy 
> wrote:
> 
>> This is not nearly as deep as the discussion so far, but I did want to
>> throw this idea out there to make sure we¹ve thought about it.
>> 
>> The Kafka project should make sure that when deployed alongside a Hadoop
>> cluster from any major distributions that it can tie seamlessly into the
>> authentication and authorization used within that cluster. For example,
>> Apache Sentry.
>> 
>> This may present additional difficulties that means a decision is made to
>> not do that or alternatively the Kerberos authentication and the
>> authorization schemes we are already working on may be sufficient.
>> 
>> I¹m not sure that anything I¹ve read so far in this discussion actually
>> poses a problem, but I¹m an Ops guy and being able to more easily
>> integrate more things, makes my life better. :)
>> 
>> -Jonathan
>> 
>> On 9/30/14, 11:26 PM, "Joe Stein"  wrote:
>> 
>>> inline
>>> 
>>> On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps  wrote:
>>> 
 Hey Joe,
 
 For (1) what are you thinking for the PermissionManager api?
 
 The way I see it, the first question we have to answer is whether it
 is possible to make authentication and authorization independent. What
 I mean by that is whether I can write an authorization library that
 will work the same whether you authenticate with ssl or kerberos.
>>> 
>>> 
>>> To me that is a requirement. We can't tie them together.  We have to
>>> provide the ability for authorization to work regardless of the
>>> authentication.  One *VERY* important use case is level of trust in
>>> authentication from the authorization perpsective.  e.g. I authorize
>>> "identity" based on the how you authenticated Alice is able to view
>>> topic X if Alice authenticated over kerberos.  Bob isn't allowed to view
>>> topic X no matter what. Alice can authenticate over not kerberos (uses
>>> cases for that) and in that case Alice wouldn't see topic X.  A concrete
>>> use case for this with Kafka would be a third party bank consuming data to
>>> a broker.  The service provider would ha

Re: Two open issues on Kafka security

2014-10-01 Thread Joe Stein
Hi Jonathan,

"Hadoop delegation tokens to enable MapReduce, Samza, or other frameworks
running in the Hadoop environment to access Kafka"
https://cwiki.apache.org/confluence/display/KAFKA/Security is on the list,
yup!

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop 
/

On Wed, Oct 1, 2014 at 12:35 PM, Jonathan Creasy 
wrote:

> This is not nearly as deep as the discussion so far, but I did want to
> throw this idea out there to make sure we¹ve thought about it.
>
> The Kafka project should make sure that when deployed alongside a Hadoop
> cluster from any major distributions that it can tie seamlessly into the
> authentication and authorization used within that cluster. For example,
> Apache Sentry.
>
> This may present additional difficulties that means a decision is made to
> not do that or alternatively the Kerberos authentication and the
> authorization schemes we are already working on may be sufficient.
>
> I¹m not sure that anything I¹ve read so far in this discussion actually
> poses a problem, but I¹m an Ops guy and being able to more easily
> integrate more things, makes my life better. :)
>
> -Jonathan
>
> On 9/30/14, 11:26 PM, "Joe Stein"  wrote:
>
> >inline
> >
> >On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps  wrote:
> >
> >> Hey Joe,
> >>
> >> For (1) what are you thinking for the PermissionManager api?
> >>
> >> The way I see it, the first question we have to answer is whether it
> >> is possible to make authentication and authorization independent. What
> >> I mean by that is whether I can write an authorization library that
> >> will work the same whether you authenticate with ssl or kerberos.
> >
> >
> >To me that is a requirement. We can't tie them together.  We have to
> >provide the ability for authorization to work regardless of the
> >authentication.  One *VERY* important use case is level of trust in
> >authentication from the authorization perpsective.  e.g. I authorize
> >"identity" based on the how you authenticated Alice is able to view
> >topic X if Alice authenticated over kerberos.  Bob isn't allowed to view
> >topic X no matter what. Alice can authenticate over not kerberos (uses
> >cases for that) and in that case Alice wouldn't see topic X.  A concrete
> >use case for this with Kafka would be a third party bank consuming data to
> >a broker.  The service provider would have some kerberos local auth for
> >that bank to-do back up that would also have access to other topics
> >related
> >to that banks data the bank itself over SSL wants a stream of events
> >(some specific topic) and that banks identity only sees that topic.  It is
> >important to not confuse identity, authentication and authorization.
> >
> >
> >> If
> >> so then we need to pick some subset of identity information that we
> >> can extract from both and have this constitute the identity we pass
> >> into the authorization interface. The original proposal had just the
> >> username/subject. But maybe we should add the ip address as well as
> >> that is useful. What I would prefer not to do is add everything in the
> >> certificate. I think the assumption is that you are generating these
> >> certificates for Kafka so you can put whatever identity info you want
> >> in the Subject Alternative Name. If that is true then just using that
> >> should be okay, right?
> >>
> >
> >I think we should just push the byte[] and let the plugin deal with it.
> >So, if we have a certificate object then pass that along with whatever
> >other meta data (e.g. IP address of client) we can.  I don't think we
> >should do any parsing whatsover and let the plugin deal with that.  Any
> >parsing we do on the identity information for the "security object" forces
> >us into specific implementations and I don't see any reason to-do that...
> >If plug-ins want an "easier" time to deal with certs and parsing and blah
> >blah blah then we can implement some way they can do this without much
> >fuss we also need to make sure that crypto library is plugable too (so
> >we can expose an API for them to call) so that HSM can be easily dropped
> >in
> >without Kafka caring... so in the plugin we could provide a
> >indentity.getAlternativeAttribute() and then that use case is solved (and
> >we can use bouncy castle or whatever to parse it for them to make it
> >easier) and always give them raw bytes so they could do it themselves.
> >
> >
> >>
> >> -Jay
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Sep 30, 2014 at 4:09 PM, Joe Stein 
> wrote:
> >> > 1) We need to support the most flexibility we can and make this
> >> transparent
> >> > to kafka (to use Gwen's term).  Any specific implementation is going
> >>to
> >> > make it not work with some solution stopping people from using Kafka.
> >> That
> >> > is a reality because everyone j

Re: Two open issues on Kafka security

2014-10-01 Thread Jonathan Creasy
This is not nearly as deep as the discussion so far, but I did want to
throw this idea out there to make sure we¹ve thought about it.

The Kafka project should make sure that when deployed alongside a Hadoop
cluster from any major distributions that it can tie seamlessly into the
authentication and authorization used within that cluster. For example,
Apache Sentry.

This may present additional difficulties that means a decision is made to
not do that or alternatively the Kerberos authentication and the
authorization schemes we are already working on may be sufficient.

I¹m not sure that anything I¹ve read so far in this discussion actually
poses a problem, but I¹m an Ops guy and being able to more easily
integrate more things, makes my life better. :)

-Jonathan

On 9/30/14, 11:26 PM, "Joe Stein"  wrote:

>inline
>
>On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps  wrote:
>
>> Hey Joe,
>>
>> For (1) what are you thinking for the PermissionManager api?
>>
>> The way I see it, the first question we have to answer is whether it
>> is possible to make authentication and authorization independent. What
>> I mean by that is whether I can write an authorization library that
>> will work the same whether you authenticate with ssl or kerberos.
>
>
>To me that is a requirement. We can't tie them together.  We have to
>provide the ability for authorization to work regardless of the
>authentication.  One *VERY* important use case is level of trust in
>authentication from the authorization perpsective.  e.g. I authorize
>"identity" based on the how you authenticated Alice is able to view
>topic X if Alice authenticated over kerberos.  Bob isn't allowed to view
>topic X no matter what. Alice can authenticate over not kerberos (uses
>cases for that) and in that case Alice wouldn't see topic X.  A concrete
>use case for this with Kafka would be a third party bank consuming data to
>a broker.  The service provider would have some kerberos local auth for
>that bank to-do back up that would also have access to other topics
>related
>to that banks data the bank itself over SSL wants a stream of events
>(some specific topic) and that banks identity only sees that topic.  It is
>important to not confuse identity, authentication and authorization.
>
>
>> If
>> so then we need to pick some subset of identity information that we
>> can extract from both and have this constitute the identity we pass
>> into the authorization interface. The original proposal had just the
>> username/subject. But maybe we should add the ip address as well as
>> that is useful. What I would prefer not to do is add everything in the
>> certificate. I think the assumption is that you are generating these
>> certificates for Kafka so you can put whatever identity info you want
>> in the Subject Alternative Name. If that is true then just using that
>> should be okay, right?
>>
>
>I think we should just push the byte[] and let the plugin deal with it.
>So, if we have a certificate object then pass that along with whatever
>other meta data (e.g. IP address of client) we can.  I don't think we
>should do any parsing whatsover and let the plugin deal with that.  Any
>parsing we do on the identity information for the "security object" forces
>us into specific implementations and I don't see any reason to-do that...
>If plug-ins want an "easier" time to deal with certs and parsing and blah
>blah blah then we can implement some way they can do this without much
>fuss we also need to make sure that crypto library is plugable too (so
>we can expose an API for them to call) so that HSM can be easily dropped
>in
>without Kafka caring... so in the plugin we could provide a
>indentity.getAlternativeAttribute() and then that use case is solved (and
>we can use bouncy castle or whatever to parse it for them to make it
>easier) and always give them raw bytes so they could do it themselves.
>
>
>>
>> -Jay
>>
>>
>>
>>
>>
>> On Tue, Sep 30, 2014 at 4:09 PM, Joe Stein  wrote:
>> > 1) We need to support the most flexibility we can and make this
>> transparent
>> > to kafka (to use Gwen's term).  Any specific implementation is going
>>to
>> > make it not work with some solution stopping people from using Kafka.
>> That
>> > is a reality because everyone just does it slightly differently
>>enough.
>> If
>> > we have an "identity" byte structure (lets not use string because some
>> > security objects are bytes) this should just fall through to the
>> > implementor.  For certs this is the entire x509 object (not just the
>> > certificate part as it could contain an ASN.1 timestamp) and inside
>>you
>> > parse and do what you want with it.
>> >
>> > 2) While I think there are many benefits to just the handshake
>>approach I
>> > don't think it outweighs the cons Jay expressed. a) We can't lead the
>> > client libraries down a new path of interacting with Kafka.  By
>> > incrementally adding to the wire protocol we are directing a very
>>clear
>> and
>> > expect ted app

Re: Two open issues on Kafka security

2014-09-30 Thread Joe Stein
inline

On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps  wrote:

> Hey Joe,
>
> For (1) what are you thinking for the PermissionManager api?
>
> The way I see it, the first question we have to answer is whether it
> is possible to make authentication and authorization independent. What
> I mean by that is whether I can write an authorization library that
> will work the same whether you authenticate with ssl or kerberos.


To me that is a requirement. We can't tie them together.  We have to
provide the ability for authorization to work regardless of the
authentication.  One *VERY* important use case is level of trust in
authentication from the authorization perpsective.  e.g. I authorize
"identity" based on the how you authenticated Alice is able to view
topic X if Alice authenticated over kerberos.  Bob isn't allowed to view
topic X no matter what. Alice can authenticate over not kerberos (uses
cases for that) and in that case Alice wouldn't see topic X.  A concrete
use case for this with Kafka would be a third party bank consuming data to
a broker.  The service provider would have some kerberos local auth for
that bank to-do back up that would also have access to other topics related
to that banks data the bank itself over SSL wants a stream of events
(some specific topic) and that banks identity only sees that topic.  It is
important to not confuse identity, authentication and authorization.


> If
> so then we need to pick some subset of identity information that we
> can extract from both and have this constitute the identity we pass
> into the authorization interface. The original proposal had just the
> username/subject. But maybe we should add the ip address as well as
> that is useful. What I would prefer not to do is add everything in the
> certificate. I think the assumption is that you are generating these
> certificates for Kafka so you can put whatever identity info you want
> in the Subject Alternative Name. If that is true then just using that
> should be okay, right?
>

I think we should just push the byte[] and let the plugin deal with it.
So, if we have a certificate object then pass that along with whatever
other meta data (e.g. IP address of client) we can.  I don't think we
should do any parsing whatsover and let the plugin deal with that.  Any
parsing we do on the identity information for the "security object" forces
us into specific implementations and I don't see any reason to-do that...
If plug-ins want an "easier" time to deal with certs and parsing and blah
blah blah then we can implement some way they can do this without much
fuss we also need to make sure that crypto library is plugable too (so
we can expose an API for them to call) so that HSM can be easily dropped in
without Kafka caring... so in the plugin we could provide a
indentity.getAlternativeAttribute() and then that use case is solved (and
we can use bouncy castle or whatever to parse it for them to make it
easier) and always give them raw bytes so they could do it themselves.


>
> -Jay
>
>
>
>
>
> On Tue, Sep 30, 2014 at 4:09 PM, Joe Stein  wrote:
> > 1) We need to support the most flexibility we can and make this
> transparent
> > to kafka (to use Gwen's term).  Any specific implementation is going to
> > make it not work with some solution stopping people from using Kafka.
> That
> > is a reality because everyone just does it slightly differently enough.
> If
> > we have an "identity" byte structure (lets not use string because some
> > security objects are bytes) this should just fall through to the
> > implementor.  For certs this is the entire x509 object (not just the
> > certificate part as it could contain an ASN.1 timestamp) and inside you
> > parse and do what you want with it.
> >
> > 2) While I think there are many benefits to just the handshake approach I
> > don't think it outweighs the cons Jay expressed. a) We can't lead the
> > client libraries down a new path of interacting with Kafka.  By
> > incrementally adding to the wire protocol we are directing a very clear
> and
> > expect ted approach.  We already have issues with implementation even
> with
> > the wire protocol in place and are trying to improve that aspect of the
> > community as a whole.  Lets not take a step backwards with this there...
> > also we need to not add more/different hoops to
> > debugging/administering/monitoring kafka so taking advantage (as Jay
> says)
> > of built in logging (etc) is important... also for the client librariy
> > developers too :)
> >
> > On Tue, Sep 30, 2014 at 6:44 PM, Gwen Shapira 
> wrote:
> >
> >> Re #1:
> >>
> >> Since the auth_to_local is a kerberos config, its up to the admin to
> >> decide how he likes the user names and set it up properly (or leave
> >> empty) and make sure the ACLs match. Simplified names may be needed if
> >> the authorization system integrates with LDAP to get groups or
> >> something fancy like that.
> >>
> >> Note that its completely transparent to Kafka - if

Re: Two open issues on Kafka security

2014-09-30 Thread Jay Kreps
Hey Gwen,

That makes sense.

I think this is one area where having pluggable authorization makes
the story a bit more complex since all the management of default
permissions or even how to ensure a user does or doesn't have a
permission is going to be specific to the authorization model a
particular authorization plugin supports.

I think this is a bit of a gap in the proposal we currently have. We
can add some server level configuration that overrides whatever the
authorization pluggin does (e.g. secure=true, means ban the nobody
user). But this is not ideal either since you would expect to be able
to grant or revoke permissions to the nobody user just like you would
anyone else on a per-topic basis.

-Jay

On Tue, Sep 30, 2014 at 4:25 PM, Gwen Shapira  wrote:
> Re #2:
>
> I don't object to the "late authentication" approach, but we need to
> make it easy for secured clusters to pass audits (SOX, PCI and
> friends).
> So, we need to be able to configure a cluster as "secured" and with
> this config switch "nobody" user to zero privileges.
> I liked the multi-port approach because blocking a non-secured port is
> very definite and easy to audit, but a single "security=on" switch
> will work as well.
>
>
>
> On Tue, Sep 30, 2014 at 4:09 PM, Joe Stein  wrote:
>> 1) We need to support the most flexibility we can and make this transparent
>> to kafka (to use Gwen's term).  Any specific implementation is going to
>> make it not work with some solution stopping people from using Kafka.  That
>> is a reality because everyone just does it slightly differently enough. If
>> we have an "identity" byte structure (lets not use string because some
>> security objects are bytes) this should just fall through to the
>> implementor.  For certs this is the entire x509 object (not just the
>> certificate part as it could contain an ASN.1 timestamp) and inside you
>> parse and do what you want with it.
>>
>> 2) While I think there are many benefits to just the handshake approach I
>> don't think it outweighs the cons Jay expressed. a) We can't lead the
>> client libraries down a new path of interacting with Kafka.  By
>> incrementally adding to the wire protocol we are directing a very clear and
>> expect ted approach.  We already have issues with implementation even with
>> the wire protocol in place and are trying to improve that aspect of the
>> community as a whole.  Lets not take a step backwards with this there...
>> also we need to not add more/different hoops to
>> debugging/administering/monitoring kafka so taking advantage (as Jay says)
>> of built in logging (etc) is important... also for the client librariy
>> developers too :)
>>
>> On Tue, Sep 30, 2014 at 6:44 PM, Gwen Shapira  wrote:
>>
>>> Re #1:
>>>
>>> Since the auth_to_local is a kerberos config, its up to the admin to
>>> decide how he likes the user names and set it up properly (or leave
>>> empty) and make sure the ACLs match. Simplified names may be needed if
>>> the authorization system integrates with LDAP to get groups or
>>> something fancy like that.
>>>
>>> Note that its completely transparent to Kafka - if the admin sets up
>>> auth_to_local rules, we simply see a different principal name. No need
>>> to do anything different.
>>>
>>> Gwen
>>>
>>> On Tue, Sep 30, 2014 at 3:31 PM, Jay Kreps  wrote:
>>> > Current proposal is here:
>>> >
>>> > https://cwiki.apache.org/confluence/display/KAFKA/Security
>>> >
>>> > Here are the two open questions I am aware of:
>>> >
>>> > 1. We want to separate authentication and authorization. This means
>>> > permissions will be assigned to some user-like subject/entity/person
>>> > string that is independent of the authorization mechanism. It sounds
>>> > like we agreed this could be done and we had in mind some krb-specific
>>> > mangling that Gwen knew about and I think the plan was to use whatever
>>> > the user chose to put in the Subject Alternative Name of the cert for
>>> > ssl. So in both cases these would translate to a string denoting the
>>> > entity whom we are granting permissions to in the authorization layer.
>>> > We should document these in the wiki to get feedback on them.
>>> >
>>> > The Hadoop approach to extraction was something like this:
>>> >
>>> http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_manually_book/content/rpm-chap14-2-3-1.html
>>> >
>>> > But actually I'm not sure if just using the full kerberos principal is
>>> > so bad? I.e. having the user be jenni...@athena.mit.edu versus just
>>> > jennifer. Where this would make a difference would be in a case where
>>> > you wanted the same user/entity to be able to authenticate via
>>> > different mechanisms (Hadoop auth, kerberos, ssl) and have a single
>>> > set of permissions.
>>> >
>>> > 2. For SASL/Kerberos we need to figure out how the communication
>>> > between client and server will be handled to pass the
>>> > challenge/response byte[]. I.e.
>>> >
>>> >
>>> http://docs.oracle.com/javase/7/docs/api/javax/se

Re: Two open issues on Kafka security

2014-09-30 Thread Jay Kreps
Hey Joe,

For (1) what are you thinking for the PermissionManager api?

The way I see it, the first question we have to answer is whether it
is possible to make authentication and authorization independent. What
I mean by that is whether I can write an authorization library that
will work the same whether you authenticate with ssl or kerberos. If
so then we need to pick some subset of identity information that we
can extract from both and have this constitute the identity we pass
into the authorization interface. The original proposal had just the
username/subject. But maybe we should add the ip address as well as
that is useful. What I would prefer not to do is add everything in the
certificate. I think the assumption is that you are generating these
certificates for Kafka so you can put whatever identity info you want
in the Subject Alternative Name. If that is true then just using that
should be okay, right?

-Jay





On Tue, Sep 30, 2014 at 4:09 PM, Joe Stein  wrote:
> 1) We need to support the most flexibility we can and make this transparent
> to kafka (to use Gwen's term).  Any specific implementation is going to
> make it not work with some solution stopping people from using Kafka.  That
> is a reality because everyone just does it slightly differently enough. If
> we have an "identity" byte structure (lets not use string because some
> security objects are bytes) this should just fall through to the
> implementor.  For certs this is the entire x509 object (not just the
> certificate part as it could contain an ASN.1 timestamp) and inside you
> parse and do what you want with it.
>
> 2) While I think there are many benefits to just the handshake approach I
> don't think it outweighs the cons Jay expressed. a) We can't lead the
> client libraries down a new path of interacting with Kafka.  By
> incrementally adding to the wire protocol we are directing a very clear and
> expect ted approach.  We already have issues with implementation even with
> the wire protocol in place and are trying to improve that aspect of the
> community as a whole.  Lets not take a step backwards with this there...
> also we need to not add more/different hoops to
> debugging/administering/monitoring kafka so taking advantage (as Jay says)
> of built in logging (etc) is important... also for the client librariy
> developers too :)
>
> On Tue, Sep 30, 2014 at 6:44 PM, Gwen Shapira  wrote:
>
>> Re #1:
>>
>> Since the auth_to_local is a kerberos config, its up to the admin to
>> decide how he likes the user names and set it up properly (or leave
>> empty) and make sure the ACLs match. Simplified names may be needed if
>> the authorization system integrates with LDAP to get groups or
>> something fancy like that.
>>
>> Note that its completely transparent to Kafka - if the admin sets up
>> auth_to_local rules, we simply see a different principal name. No need
>> to do anything different.
>>
>> Gwen
>>
>> On Tue, Sep 30, 2014 at 3:31 PM, Jay Kreps  wrote:
>> > Current proposal is here:
>> >
>> > https://cwiki.apache.org/confluence/display/KAFKA/Security
>> >
>> > Here are the two open questions I am aware of:
>> >
>> > 1. We want to separate authentication and authorization. This means
>> > permissions will be assigned to some user-like subject/entity/person
>> > string that is independent of the authorization mechanism. It sounds
>> > like we agreed this could be done and we had in mind some krb-specific
>> > mangling that Gwen knew about and I think the plan was to use whatever
>> > the user chose to put in the Subject Alternative Name of the cert for
>> > ssl. So in both cases these would translate to a string denoting the
>> > entity whom we are granting permissions to in the authorization layer.
>> > We should document these in the wiki to get feedback on them.
>> >
>> > The Hadoop approach to extraction was something like this:
>> >
>> http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_manually_book/content/rpm-chap14-2-3-1.html
>> >
>> > But actually I'm not sure if just using the full kerberos principal is
>> > so bad? I.e. having the user be jenni...@athena.mit.edu versus just
>> > jennifer. Where this would make a difference would be in a case where
>> > you wanted the same user/entity to be able to authenticate via
>> > different mechanisms (Hadoop auth, kerberos, ssl) and have a single
>> > set of permissions.
>> >
>> > 2. For SASL/Kerberos we need to figure out how the communication
>> > between client and server will be handled to pass the
>> > challenge/response byte[]. I.e.
>> >
>> >
>> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.html#evaluateChallenge(byte[])
>> >
>> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.html#evaluateResponse(byte[])
>> >
>> > I am not super expert in this area but I will try to give my
>> > understanding and I'm sure someone can correct me if I am confused.
>> >
>> > Unlike SSL the transmission of this is actuall

Re: Two open issues on Kafka security

2014-09-30 Thread Joe Stein
<< we need to make it easy for secured clusters to pass audits (SOX, PCI
and friends)

I think this is the MVP for the security features for 0.9 as a guideline
for how we should be proceeding.

On Tue, Sep 30, 2014 at 7:25 PM, Gwen Shapira  wrote:

> Re #2:
>
> I don't object to the "late authentication" approach, but we need to
> make it easy for secured clusters to pass audits (SOX, PCI and
> friends).
> So, we need to be able to configure a cluster as "secured" and with
> this config switch "nobody" user to zero privileges.
> I liked the multi-port approach because blocking a non-secured port is
> very definite and easy to audit, but a single "security=on" switch
> will work as well.
>
>
>
> On Tue, Sep 30, 2014 at 4:09 PM, Joe Stein  wrote:
> > 1) We need to support the most flexibility we can and make this
> transparent
> > to kafka (to use Gwen's term).  Any specific implementation is going to
> > make it not work with some solution stopping people from using Kafka.
> That
> > is a reality because everyone just does it slightly differently enough.
> If
> > we have an "identity" byte structure (lets not use string because some
> > security objects are bytes) this should just fall through to the
> > implementor.  For certs this is the entire x509 object (not just the
> > certificate part as it could contain an ASN.1 timestamp) and inside you
> > parse and do what you want with it.
> >
> > 2) While I think there are many benefits to just the handshake approach I
> > don't think it outweighs the cons Jay expressed. a) We can't lead the
> > client libraries down a new path of interacting with Kafka.  By
> > incrementally adding to the wire protocol we are directing a very clear
> and
> > expect ted approach.  We already have issues with implementation even
> with
> > the wire protocol in place and are trying to improve that aspect of the
> > community as a whole.  Lets not take a step backwards with this there...
> > also we need to not add more/different hoops to
> > debugging/administering/monitoring kafka so taking advantage (as Jay
> says)
> > of built in logging (etc) is important... also for the client librariy
> > developers too :)
> >
> > On Tue, Sep 30, 2014 at 6:44 PM, Gwen Shapira 
> wrote:
> >
> >> Re #1:
> >>
> >> Since the auth_to_local is a kerberos config, its up to the admin to
> >> decide how he likes the user names and set it up properly (or leave
> >> empty) and make sure the ACLs match. Simplified names may be needed if
> >> the authorization system integrates with LDAP to get groups or
> >> something fancy like that.
> >>
> >> Note that its completely transparent to Kafka - if the admin sets up
> >> auth_to_local rules, we simply see a different principal name. No need
> >> to do anything different.
> >>
> >> Gwen
> >>
> >> On Tue, Sep 30, 2014 at 3:31 PM, Jay Kreps  wrote:
> >> > Current proposal is here:
> >> >
> >> > https://cwiki.apache.org/confluence/display/KAFKA/Security
> >> >
> >> > Here are the two open questions I am aware of:
> >> >
> >> > 1. We want to separate authentication and authorization. This means
> >> > permissions will be assigned to some user-like subject/entity/person
> >> > string that is independent of the authorization mechanism. It sounds
> >> > like we agreed this could be done and we had in mind some krb-specific
> >> > mangling that Gwen knew about and I think the plan was to use whatever
> >> > the user chose to put in the Subject Alternative Name of the cert for
> >> > ssl. So in both cases these would translate to a string denoting the
> >> > entity whom we are granting permissions to in the authorization layer.
> >> > We should document these in the wiki to get feedback on them.
> >> >
> >> > The Hadoop approach to extraction was something like this:
> >> >
> >>
> http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_manually_book/content/rpm-chap14-2-3-1.html
> >> >
> >> > But actually I'm not sure if just using the full kerberos principal is
> >> > so bad? I.e. having the user be jenni...@athena.mit.edu versus just
> >> > jennifer. Where this would make a difference would be in a case where
> >> > you wanted the same user/entity to be able to authenticate via
> >> > different mechanisms (Hadoop auth, kerberos, ssl) and have a single
> >> > set of permissions.
> >> >
> >> > 2. For SASL/Kerberos we need to figure out how the communication
> >> > between client and server will be handled to pass the
> >> > challenge/response byte[]. I.e.
> >> >
> >> >
> >>
> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.html#evaluateChallenge(byte[])
> >> >
> >>
> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.html#evaluateResponse(byte[])
> >> >
> >> > I am not super expert in this area but I will try to give my
> >> > understanding and I'm sure someone can correct me if I am confused.
> >> >
> >> > Unlike SSL the transmission of this is actually outside the scope of
> >> > SASL so we have to specif

Re: Two open issues on Kafka security

2014-09-30 Thread Gwen Shapira
Re #2:

I don't object to the "late authentication" approach, but we need to
make it easy for secured clusters to pass audits (SOX, PCI and
friends).
So, we need to be able to configure a cluster as "secured" and with
this config switch "nobody" user to zero privileges.
I liked the multi-port approach because blocking a non-secured port is
very definite and easy to audit, but a single "security=on" switch
will work as well.



On Tue, Sep 30, 2014 at 4:09 PM, Joe Stein  wrote:
> 1) We need to support the most flexibility we can and make this transparent
> to kafka (to use Gwen's term).  Any specific implementation is going to
> make it not work with some solution stopping people from using Kafka.  That
> is a reality because everyone just does it slightly differently enough. If
> we have an "identity" byte structure (lets not use string because some
> security objects are bytes) this should just fall through to the
> implementor.  For certs this is the entire x509 object (not just the
> certificate part as it could contain an ASN.1 timestamp) and inside you
> parse and do what you want with it.
>
> 2) While I think there are many benefits to just the handshake approach I
> don't think it outweighs the cons Jay expressed. a) We can't lead the
> client libraries down a new path of interacting with Kafka.  By
> incrementally adding to the wire protocol we are directing a very clear and
> expect ted approach.  We already have issues with implementation even with
> the wire protocol in place and are trying to improve that aspect of the
> community as a whole.  Lets not take a step backwards with this there...
> also we need to not add more/different hoops to
> debugging/administering/monitoring kafka so taking advantage (as Jay says)
> of built in logging (etc) is important... also for the client librariy
> developers too :)
>
> On Tue, Sep 30, 2014 at 6:44 PM, Gwen Shapira  wrote:
>
>> Re #1:
>>
>> Since the auth_to_local is a kerberos config, its up to the admin to
>> decide how he likes the user names and set it up properly (or leave
>> empty) and make sure the ACLs match. Simplified names may be needed if
>> the authorization system integrates with LDAP to get groups or
>> something fancy like that.
>>
>> Note that its completely transparent to Kafka - if the admin sets up
>> auth_to_local rules, we simply see a different principal name. No need
>> to do anything different.
>>
>> Gwen
>>
>> On Tue, Sep 30, 2014 at 3:31 PM, Jay Kreps  wrote:
>> > Current proposal is here:
>> >
>> > https://cwiki.apache.org/confluence/display/KAFKA/Security
>> >
>> > Here are the two open questions I am aware of:
>> >
>> > 1. We want to separate authentication and authorization. This means
>> > permissions will be assigned to some user-like subject/entity/person
>> > string that is independent of the authorization mechanism. It sounds
>> > like we agreed this could be done and we had in mind some krb-specific
>> > mangling that Gwen knew about and I think the plan was to use whatever
>> > the user chose to put in the Subject Alternative Name of the cert for
>> > ssl. So in both cases these would translate to a string denoting the
>> > entity whom we are granting permissions to in the authorization layer.
>> > We should document these in the wiki to get feedback on them.
>> >
>> > The Hadoop approach to extraction was something like this:
>> >
>> http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_manually_book/content/rpm-chap14-2-3-1.html
>> >
>> > But actually I'm not sure if just using the full kerberos principal is
>> > so bad? I.e. having the user be jenni...@athena.mit.edu versus just
>> > jennifer. Where this would make a difference would be in a case where
>> > you wanted the same user/entity to be able to authenticate via
>> > different mechanisms (Hadoop auth, kerberos, ssl) and have a single
>> > set of permissions.
>> >
>> > 2. For SASL/Kerberos we need to figure out how the communication
>> > between client and server will be handled to pass the
>> > challenge/response byte[]. I.e.
>> >
>> >
>> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.html#evaluateChallenge(byte[])
>> >
>> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.html#evaluateResponse(byte[])
>> >
>> > I am not super expert in this area but I will try to give my
>> > understanding and I'm sure someone can correct me if I am confused.
>> >
>> > Unlike SSL the transmission of this is actually outside the scope of
>> > SASL so we have to specify this. Two proposals
>> >
>> > Original Proposal: Add a new "authenticate" request/response
>> >
>> > The proposal in the original wiki was to add a new "authenticate"
>> > request/response to pass this information. This matches what was done
>> > in the kerberos implementation for zookeeper. The intention is that
>> > the client would send this request immediately after establishing a
>> > connection, in which case it acts much like a "handshake"

Re: Two open issues on Kafka security

2014-09-30 Thread Joe Stein
1) We need to support the most flexibility we can and make this transparent
to kafka (to use Gwen's term).  Any specific implementation is going to
make it not work with some solution stopping people from using Kafka.  That
is a reality because everyone just does it slightly differently enough. If
we have an "identity" byte structure (lets not use string because some
security objects are bytes) this should just fall through to the
implementor.  For certs this is the entire x509 object (not just the
certificate part as it could contain an ASN.1 timestamp) and inside you
parse and do what you want with it.

2) While I think there are many benefits to just the handshake approach I
don't think it outweighs the cons Jay expressed. a) We can't lead the
client libraries down a new path of interacting with Kafka.  By
incrementally adding to the wire protocol we are directing a very clear and
expect ted approach.  We already have issues with implementation even with
the wire protocol in place and are trying to improve that aspect of the
community as a whole.  Lets not take a step backwards with this there...
also we need to not add more/different hoops to
debugging/administering/monitoring kafka so taking advantage (as Jay says)
of built in logging (etc) is important... also for the client librariy
developers too :)

On Tue, Sep 30, 2014 at 6:44 PM, Gwen Shapira  wrote:

> Re #1:
>
> Since the auth_to_local is a kerberos config, its up to the admin to
> decide how he likes the user names and set it up properly (or leave
> empty) and make sure the ACLs match. Simplified names may be needed if
> the authorization system integrates with LDAP to get groups or
> something fancy like that.
>
> Note that its completely transparent to Kafka - if the admin sets up
> auth_to_local rules, we simply see a different principal name. No need
> to do anything different.
>
> Gwen
>
> On Tue, Sep 30, 2014 at 3:31 PM, Jay Kreps  wrote:
> > Current proposal is here:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/Security
> >
> > Here are the two open questions I am aware of:
> >
> > 1. We want to separate authentication and authorization. This means
> > permissions will be assigned to some user-like subject/entity/person
> > string that is independent of the authorization mechanism. It sounds
> > like we agreed this could be done and we had in mind some krb-specific
> > mangling that Gwen knew about and I think the plan was to use whatever
> > the user chose to put in the Subject Alternative Name of the cert for
> > ssl. So in both cases these would translate to a string denoting the
> > entity whom we are granting permissions to in the authorization layer.
> > We should document these in the wiki to get feedback on them.
> >
> > The Hadoop approach to extraction was something like this:
> >
> http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_manually_book/content/rpm-chap14-2-3-1.html
> >
> > But actually I'm not sure if just using the full kerberos principal is
> > so bad? I.e. having the user be jenni...@athena.mit.edu versus just
> > jennifer. Where this would make a difference would be in a case where
> > you wanted the same user/entity to be able to authenticate via
> > different mechanisms (Hadoop auth, kerberos, ssl) and have a single
> > set of permissions.
> >
> > 2. For SASL/Kerberos we need to figure out how the communication
> > between client and server will be handled to pass the
> > challenge/response byte[]. I.e.
> >
> >
> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.html#evaluateChallenge(byte[])
> >
> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.html#evaluateResponse(byte[])
> >
> > I am not super expert in this area but I will try to give my
> > understanding and I'm sure someone can correct me if I am confused.
> >
> > Unlike SSL the transmission of this is actually outside the scope of
> > SASL so we have to specify this. Two proposals
> >
> > Original Proposal: Add a new "authenticate" request/response
> >
> > The proposal in the original wiki was to add a new "authenticate"
> > request/response to pass this information. This matches what was done
> > in the kerberos implementation for zookeeper. The intention is that
> > the client would send this request immediately after establishing a
> > connection, in which case it acts much like a "handshake", however
> > there is no requirement that they do so.
> >
> > Whether the authentication happens via SSL or via Kerberos, the effect
> > will just be to set the username in their session. This will default
> > to the "anybody" user. So in the default non-secure case we will just
> > be defaulting "anybody" to have full permission. So to answer the
> > question about whether changing user is required or not, I don't think
> > it is but I think we kind of get it for free in this approach.
> >
> > In this approach there is no particular need or advantage to having a
> > separate port fo

Re: Two open issues on Kafka security

2014-09-30 Thread Gwen Shapira
Re #1:

Since the auth_to_local is a kerberos config, its up to the admin to
decide how he likes the user names and set it up properly (or leave
empty) and make sure the ACLs match. Simplified names may be needed if
the authorization system integrates with LDAP to get groups or
something fancy like that.

Note that its completely transparent to Kafka - if the admin sets up
auth_to_local rules, we simply see a different principal name. No need
to do anything different.

Gwen

On Tue, Sep 30, 2014 at 3:31 PM, Jay Kreps  wrote:
> Current proposal is here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/Security
>
> Here are the two open questions I am aware of:
>
> 1. We want to separate authentication and authorization. This means
> permissions will be assigned to some user-like subject/entity/person
> string that is independent of the authorization mechanism. It sounds
> like we agreed this could be done and we had in mind some krb-specific
> mangling that Gwen knew about and I think the plan was to use whatever
> the user chose to put in the Subject Alternative Name of the cert for
> ssl. So in both cases these would translate to a string denoting the
> entity whom we are granting permissions to in the authorization layer.
> We should document these in the wiki to get feedback on them.
>
> The Hadoop approach to extraction was something like this:
> http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_manually_book/content/rpm-chap14-2-3-1.html
>
> But actually I'm not sure if just using the full kerberos principal is
> so bad? I.e. having the user be jenni...@athena.mit.edu versus just
> jennifer. Where this would make a difference would be in a case where
> you wanted the same user/entity to be able to authenticate via
> different mechanisms (Hadoop auth, kerberos, ssl) and have a single
> set of permissions.
>
> 2. For SASL/Kerberos we need to figure out how the communication
> between client and server will be handled to pass the
> challenge/response byte[]. I.e.
>
> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.html#evaluateChallenge(byte[])
> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.html#evaluateResponse(byte[])
>
> I am not super expert in this area but I will try to give my
> understanding and I'm sure someone can correct me if I am confused.
>
> Unlike SSL the transmission of this is actually outside the scope of
> SASL so we have to specify this. Two proposals
>
> Original Proposal: Add a new "authenticate" request/response
>
> The proposal in the original wiki was to add a new "authenticate"
> request/response to pass this information. This matches what was done
> in the kerberos implementation for zookeeper. The intention is that
> the client would send this request immediately after establishing a
> connection, in which case it acts much like a "handshake", however
> there is no requirement that they do so.
>
> Whether the authentication happens via SSL or via Kerberos, the effect
> will just be to set the username in their session. This will default
> to the "anybody" user. So in the default non-secure case we will just
> be defaulting "anybody" to have full permission. So to answer the
> question about whether changing user is required or not, I don't think
> it is but I think we kind of get it for free in this approach.
>
> In this approach there is no particular need or advantage to having a
> separate port for kerberos I don't think.
>
> Alternate Proposal: Create a Handshake
>
> The alternative I think Michael was proposing was to create a
> handshake that would happen at connection time on connections coming
> in on the SASL port. This would require a separate port for SASL since
> otherwise you wouldn't be able to tell if the bytes you were getting
> were for SASL or were the first request of an unauthenticated
> connection.
>
> Michael it would be good to work out the details of how this works.
> Are we just sending size-delimited byte arrays back and forth until
> the challenge response terminates?
>
> My Take
>
> The pro I see for Michael's proposal is that it keeps the
> authentication logic more localized in the socket server.
>
> I see two cons:
> 1. Since the handshake won't go through the normal api layer it won't
> go through the normal logging (e.g. request log), jmx monitoring,
> client trace token, correlation id, etc that we get for other
> requests. This could make operations a little confusing and make
> debugging a little harder since the client will be blocking on network
> requests without the normal logging.
> 2. This part of the protocol will be inconsistent with the rest of the
> Kafka protocol so it will be a little odd for client implementors as
> this will effectively be a request/response that they will have to
> implement that will be different from all the other request/responses
> they implement.
>
> In practice these two alternatives are not very different except that
> in the o

Two open issues on Kafka security

2014-09-30 Thread Jay Kreps
Current proposal is here:

https://cwiki.apache.org/confluence/display/KAFKA/Security

Here are the two open questions I am aware of:

1. We want to separate authentication and authorization. This means
permissions will be assigned to some user-like subject/entity/person
string that is independent of the authorization mechanism. It sounds
like we agreed this could be done and we had in mind some krb-specific
mangling that Gwen knew about and I think the plan was to use whatever
the user chose to put in the Subject Alternative Name of the cert for
ssl. So in both cases these would translate to a string denoting the
entity whom we are granting permissions to in the authorization layer.
We should document these in the wiki to get feedback on them.

The Hadoop approach to extraction was something like this:
http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_manually_book/content/rpm-chap14-2-3-1.html

But actually I'm not sure if just using the full kerberos principal is
so bad? I.e. having the user be jenni...@athena.mit.edu versus just
jennifer. Where this would make a difference would be in a case where
you wanted the same user/entity to be able to authenticate via
different mechanisms (Hadoop auth, kerberos, ssl) and have a single
set of permissions.

2. For SASL/Kerberos we need to figure out how the communication
between client and server will be handled to pass the
challenge/response byte[]. I.e.

http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.html#evaluateChallenge(byte[])
http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.html#evaluateResponse(byte[])

I am not super expert in this area but I will try to give my
understanding and I'm sure someone can correct me if I am confused.

Unlike SSL the transmission of this is actually outside the scope of
SASL so we have to specify this. Two proposals

Original Proposal: Add a new "authenticate" request/response

The proposal in the original wiki was to add a new "authenticate"
request/response to pass this information. This matches what was done
in the kerberos implementation for zookeeper. The intention is that
the client would send this request immediately after establishing a
connection, in which case it acts much like a "handshake", however
there is no requirement that they do so.

Whether the authentication happens via SSL or via Kerberos, the effect
will just be to set the username in their session. This will default
to the "anybody" user. So in the default non-secure case we will just
be defaulting "anybody" to have full permission. So to answer the
question about whether changing user is required or not, I don't think
it is but I think we kind of get it for free in this approach.

In this approach there is no particular need or advantage to having a
separate port for kerberos I don't think.

Alternate Proposal: Create a Handshake

The alternative I think Michael was proposing was to create a
handshake that would happen at connection time on connections coming
in on the SASL port. This would require a separate port for SASL since
otherwise you wouldn't be able to tell if the bytes you were getting
were for SASL or were the first request of an unauthenticated
connection.

Michael it would be good to work out the details of how this works.
Are we just sending size-delimited byte arrays back and forth until
the challenge response terminates?

My Take

The pro I see for Michael's proposal is that it keeps the
authentication logic more localized in the socket server.

I see two cons:
1. Since the handshake won't go through the normal api layer it won't
go through the normal logging (e.g. request log), jmx monitoring,
client trace token, correlation id, etc that we get for other
requests. This could make operations a little confusing and make
debugging a little harder since the client will be blocking on network
requests without the normal logging.
2. This part of the protocol will be inconsistent with the rest of the
Kafka protocol so it will be a little odd for client implementors as
this will effectively be a request/response that they will have to
implement that will be different from all the other request/responses
they implement.

In practice these two alternatives are not very different except that
in the original proposal the bytes you send are prefixed by the normal
request header fields such as the client id, correlation id, etc.
Overall I would prefer this as I think it is a bit more consistent
from the client's point of view.

Cheers,

-Jay