I agree, username+IP would be sufficient. I assume, when authentication is 
turned off or doesn’t exist, but authorization plugin is enabled, then username 
would be empty or passed as “nobody”, but with valid IP (if available).

> The name “context" is probably not the right one. The idea is to have an
> object into which we can easily add additional properties in the future
> to support additional authorization libraries without breaking backward
> compatibility with existing ones.
+1. Makes the design scalable.

Thanks

Bosco


> 
> 
> ----- Original message -----
> From: Jarek Jarcec Cecho <jar...@apache.org>
> To: dev@kafka.apache.org
> Subject: Re: Two open issues on Kafka security
> Date: Thu, 2 Oct 2014 08:33:45 -0700
> 
> Thanks for getting back Jay!
> 
> For the interface - Looking at Sentry and other authorization libraries
> in the Hadoop eco system it seems that “username” is primarily use to
> perform authorization these days. And then IP for auditing. Hence I feel
> that username+IP would be sufficient, at least for now. However I would
> assume that in the future we might need more then just those two, so
> what about defining the API in a way that we can easily extend in the
> future, something like?
> 
> authorize(Context, Entity, Action), where
> 
> * Action - is the action that user is trying to do (read to topic, read
> from topic, create topic, …)
> * Entity - given entity that user is trying to perform that action on
> (topic, …)
> * Context - container with user/session information - user name, IP
> address or perhaps entire certificate as was suggested early on the
> email thread.
> 
> The name “context" is probably not the right one. The idea is to have an
> object into which we can easily add additional properties in the future
> to support additional authorization libraries without breaking backward
> compatibility with existing ones.
> 
> The hierarchy is interesting topic - I’m not familiar enough with Kafka
> internals so I can’t really talk about how much more complex it would
> be. I can speak about Sentry and the way we designed security model for
> Hive and Search where introducing the hierarchy wasn’t complex at all
> and actually lead to a cleaner model. The biggest user visible benefit
> is that you don’t have to deal with special rules such as “give READ
> privilege to user jarcec to ALL topics”. If you have a singleton parent
> entity (service or whatever name seems more accurate), you can easily
> say that you have the READ access on this root entity and then all
> topics will simply inherit that.
> 
> Jarcec
> 
> On Oct 1, 2014, at 9:33 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
> 
>> Hey Jarek,
>> 
>> I agree with the importance of separating authentication and
>> authorization. The question is what concept of identity is sufficient
>> to pass through to the authorization layer? Just a "user name"? Or
>> perhaps you also need the ip the request originated from? Whatever
>> these would be it would be nice to enumerate them so the authz portion
>> can be written in a way that ignores the authn part.
>> 
>> So if no one else proposes anything different maybe we can just say
>> user name + ip?
>> 
>> With respect to hierarchy, it would be nice to have topic hierarchies
>> but we don't have them now so seems overkill to try to think them
>> through wrt security now, right?
>> 
>> -Jay
>> 
>> 
>> 
>> On Wed, Oct 1, 2014 at 1:13 PM, Jarek Jarcec Cecho <jar...@apache.org> wrote:
>>> I’m following the security proposal wiki page [1] and this discussion and I 
>>> would like to jump in with few points if I might :)  Let me start by saying 
>>> that I like the material and the discussion here, good work!
>>> 
>>> I was part of the team who originally designed and worked on Sentry and I 
>>> wanted to share few to see how it will resonate with people.  My first and 
>>> probably biggest point would be to separate authorization and 
>>> authentication as two separate systems. I believe that Jao has already 
>>> stressed that in the email thread, but I wanted to reiterate on that point. 
>>> In my experience users don’t care that much about how the user has been 
>>> authenticated if they trust that mechanism, what they care more about is 
>>> that the authorization model is consistent and behaves the same way. E.g. 
>>> if I configured that user jarcec can write into topic “logs”, he should be 
>>> able to do that no matter where the connection came from - whether he has 
>>> been authorized from Kerberos as he is directly exploring the data from his 
>>> computer, he is authorized through delegation token because he is running 
>>> map reduce jobs calculating statistics or he is  authorized through SSL 
>>> certificated because … (well I’m missing good example here, but you’re 
>>> probably following my point).
>>> 
>>> I’ve also noticed that we are planning to have no hierarchy in the authz 
>>> object model per the wiki [1] with the reasoning that Kafka do not supports 
>>> topic hierarchy. I see that point, but at the same time it got me thinking 
>>> - are we sure that Kafka will never have hierarchic topics? Seems as a nice 
>>> feature that might be usable for some use cases and something that we might 
>>> want to add in the future. But regardless of that I would suggest to 
>>> introduce a hierarchy anyway, even though if it would be just two levels. 
>>> In sentry (for Hive) we’ve introduced concept of “Service” where all the 
>>> databases are children of the service. In Kafka I would imagine that we 
>>> would have “service” and “topics” as the children. Having this is much 
>>> easier to model general privileges where you need to grant access to all 
>>> topics - you will just grant access to the entire service and all topics 
>>> will get “inherited”.
>>> 
>>> I’m wondering what are other people thoughts?
>>> 
>>> Jarcec
>>> 
>>> Links:
>>> 1: https://cwiki.apache.org/confluence/display/KAFKA/Security
>>> 
>>> On Oct 1, 2014, at 9:44 AM, Joe Stein <joe.st...@stealth.ly> wrote:
>>> 
>>>> Hi Jonathan,
>>>> 
>>>> "Hadoop delegation tokens to enable MapReduce, Samza, or other frameworks
>>>> running in the Hadoop environment to access Kafka"
>>>> https://cwiki.apache.org/confluence/display/KAFKA/Security is on the list,
>>>> yup!
>>>> 
>>>> /*******************************************
>>>> Joe Stein
>>>> Founder, Principal Consultant
>>>> Big Data Open Source Security LLC
>>>> http://www.stealth.ly
>>>> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>>>> ********************************************/
>>>> 
>>>> On Wed, Oct 1, 2014 at 12:35 PM, Jonathan Creasy <jonathan.cre...@turn.com>
>>>> wrote:
>>>> 
>>>>> This is not nearly as deep as the discussion so far, but I did want to
>>>>> throw this idea out there to make sure we¹ve thought about it.
>>>>> 
>>>>> The Kafka project should make sure that when deployed alongside a Hadoop
>>>>> cluster from any major distributions that it can tie seamlessly into the
>>>>> authentication and authorization used within that cluster. For example,
>>>>> Apache Sentry.
>>>>> 
>>>>> This may present additional difficulties that means a decision is made to
>>>>> not do that or alternatively the Kerberos authentication and the
>>>>> authorization schemes we are already working on may be sufficient.
>>>>> 
>>>>> I¹m not sure that anything I¹ve read so far in this discussion actually
>>>>> poses a problem, but I¹m an Ops guy and being able to more easily
>>>>> integrate more things, makes my life better. :)
>>>>> 
>>>>> -Jonathan
>>>>> 
>>>>> On 9/30/14, 11:26 PM, "Joe Stein" <joe.st...@stealth.ly> wrote:
>>>>> 
>>>>>> inline
>>>>>> 
>>>>>> On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
>>>>>> 
>>>>>>> Hey Joe,
>>>>>>> 
>>>>>>> For (1) what are you thinking for the PermissionManager api?
>>>>>>> 
>>>>>>> The way I see it, the first question we have to answer is whether it
>>>>>>> is possible to make authentication and authorization independent. What
>>>>>>> I mean by that is whether I can write an authorization library that
>>>>>>> will work the same whether you authenticate with ssl or kerberos.
>>>>>> 
>>>>>> 
>>>>>> To me that is a requirement. We can't tie them together.  We have to
>>>>>> provide the ability for authorization to work regardless of the
>>>>>> authentication.  One *VERY* important use case is level of trust in
>>>>>> authentication from the authorization perpsective.  e.g. I authorize
>>>>>> "identity" based on the how you authenticated.... Alice is able to view
>>>>>> topic X if Alice authenticated over kerberos.  Bob isn't allowed to view
>>>>>> topic X no matter what. Alice can authenticate over not kerberos (uses
>>>>>> cases for that) and in that case Alice wouldn't see topic X.  A concrete
>>>>>> use case for this with Kafka would be a third party bank consuming data 
>>>>>> to
>>>>>> a broker.  The service provider would have some kerberos local auth for
>>>>>> that bank to-do back up that would also have access to other topics
>>>>>> related
>>>>>> to that banks data.... the bank itself over SSL wants a stream of events
>>>>>> (some specific topic) and that banks identity only sees that topic.  It 
>>>>>> is
>>>>>> important to not confuse identity, authentication and authorization.
>>>>>> 
>>>>>> 
>>>>>>> If
>>>>>>> so then we need to pick some subset of identity information that we
>>>>>>> can extract from both and have this constitute the identity we pass
>>>>>>> into the authorization interface. The original proposal had just the
>>>>>>> username/subject. But maybe we should add the ip address as well as
>>>>>>> that is useful. What I would prefer not to do is add everything in the
>>>>>>> certificate. I think the assumption is that you are generating these
>>>>>>> certificates for Kafka so you can put whatever identity info you want
>>>>>>> in the Subject Alternative Name. If that is true then just using that
>>>>>>> should be okay, right?
>>>>>>> 
>>>>>> 
>>>>>> I think we should just push the byte[] and let the plugin deal with it.
>>>>>> So, if we have a certificate object then pass that along with whatever
>>>>>> other meta data (e.g. IP address of client) we can.  I don't think we
>>>>>> should do any parsing whatsover and let the plugin deal with that.  Any
>>>>>> parsing we do on the identity information for the "security object" 
>>>>>> forces
>>>>>> us into specific implementations and I don't see any reason to-do that...
>>>>>> If plug-ins want an "easier" time to deal with certs and parsing and blah
>>>>>> blah blah then we can implement some way they can do this without much
>>>>>> fuss.... we also need to make sure that crypto library is plugable too 
>>>>>> (so
>>>>>> we can expose an API for them to call) so that HSM can be easily dropped
>>>>>> in
>>>>>> without Kafka caring... so in the plugin we could provide a
>>>>>> indentity.getAlternativeAttribute() and then that use case is solved (and
>>>>>> we can use bouncy castle or whatever to parse it for them to make it
>>>>>> easier).... and always give them raw bytes so they could do it 
>>>>>> themselves.
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> -Jay
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Sep 30, 2014 at 4:09 PM, Joe Stein <joe.st...@stealth.ly>
>>>>> wrote:
>>>>>>>> 1) We need to support the most flexibility we can and make this
>>>>>>> transparent
>>>>>>>> to kafka (to use Gwen's term).  Any specific implementation is going
>>>>>>> to
>>>>>>>> make it not work with some solution stopping people from using Kafka.
>>>>>>> That
>>>>>>>> is a reality because everyone just does it slightly differently
>>>>>>> enough.
>>>>>>> If
>>>>>>>> we have an "identity" byte structure (lets not use string because some
>>>>>>>> security objects are bytes) this should just fall through to the
>>>>>>>> implementor.  For certs this is the entire x509 object (not just the
>>>>>>>> certificate part as it could contain an ASN.1 timestamp) and inside
>>>>>>> you
>>>>>>>> parse and do what you want with it.
>>>>>>>> 
>>>>>>>> 2) While I think there are many benefits to just the handshake
>>>>>>> approach I
>>>>>>>> don't think it outweighs the cons Jay expressed. a) We can't lead the
>>>>>>>> client libraries down a new path of interacting with Kafka.  By
>>>>>>>> incrementally adding to the wire protocol we are directing a very
>>>>>>> clear
>>>>>>> and
>>>>>>>> expect ted approach.  We already have issues with implementation even
>>>>>>> with
>>>>>>>> the wire protocol in place and are trying to improve that aspect of
>>>>>>> the
>>>>>>>> community as a whole.  Lets not take a step backwards with this
>>>>>>> there...
>>>>>>>> also we need to not add more/different hoops to
>>>>>>>> debugging/administering/monitoring kafka so taking advantage (as Jay
>>>>>>> says)
>>>>>>>> of built in logging (etc) is important... also for the client librariy
>>>>>>>> developers too :)
>>>>>>>> 
>>>>>>>> On Tue, Sep 30, 2014 at 6:44 PM, Gwen Shapira <gshap...@cloudera.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Re #1:
>>>>>>>>> 
>>>>>>>>> Since the auth_to_local is a kerberos config, its up to the admin to
>>>>>>>>> decide how he likes the user names and set it up properly (or leave
>>>>>>>>> empty) and make sure the ACLs match. Simplified names may be needed
>>>>>>> if
>>>>>>>>> the authorization system integrates with LDAP to get groups or
>>>>>>>>> something fancy like that.
>>>>>>>>> 
>>>>>>>>> Note that its completely transparent to Kafka - if the admin sets up
>>>>>>>>> auth_to_local rules, we simply see a different principal name. No
>>>>>>> need
>>>>>>>>> to do anything different.
>>>>>>>>> 
>>>>>>>>> Gwen
>>>>>>>>> 
>>>>>>>>> On Tue, Sep 30, 2014 at 3:31 PM, Jay Kreps <jay.kr...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>> Current proposal is here:
>>>>>>>>>> 
>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Security
>>>>>>>>>> 
>>>>>>>>>> Here are the two open questions I am aware of:
>>>>>>>>>> 
>>>>>>>>>> 1. We want to separate authentication and authorization. This means
>>>>>>>>>> permissions will be assigned to some user-like
>>>>>>> subject/entity/person
>>>>>>>>>> string that is independent of the authorization mechanism. It
>>>>>>> sounds
>>>>>>>>>> like we agreed this could be done and we had in mind some
>>>>>>> krb-specific
>>>>>>>>>> mangling that Gwen knew about and I think the plan was to use
>>>>>>> whatever
>>>>>>>>>> the user chose to put in the Subject Alternative Name of the cert
>>>>>>> for
>>>>>>>>>> ssl. So in both cases these would translate to a string denoting
>>>>>>> the
>>>>>>>>>> entity whom we are granting permissions to in the authorization
>>>>>>> layer.
>>>>>>>>>> We should document these in the wiki to get feedback on them.
>>>>>>>>>> 
>>>>>>>>>> The Hadoop approach to extraction was something like this:
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_man
>>>>>>> ually_book/content/rpm-chap14-2-3-1.html
>>>>>>>>>> 
>>>>>>>>>> But actually I'm not sure if just using the full kerberos
>>>>>>> principal is
>>>>>>>>>> so bad? I.e. having the user be jenni...@athena.mit.edu versus
>>>>> just
>>>>>>>>>> jennifer. Where this would make a difference would be in a case
>>>>>>> where
>>>>>>>>>> you wanted the same user/entity to be able to authenticate via
>>>>>>>>>> different mechanisms (Hadoop auth, kerberos, ssl) and have a single
>>>>>>>>>> set of permissions.
>>>>>>>>>> 
>>>>>>>>>> 2. For SASL/Kerberos we need to figure out how the communication
>>>>>>>>>> between client and server will be handled to pass the
>>>>>>>>>> challenge/response byte[]. I.e.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.h
>>>>>>> tml#evaluateChallenge(byte[])
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.h
>>>>>>> tml#evaluateResponse(byte[])
>>>>>>>>>> 
>>>>>>>>>> I am not super expert in this area but I will try to give my
>>>>>>>>>> understanding and I'm sure someone can correct me if I am confused.
>>>>>>>>>> 
>>>>>>>>>> Unlike SSL the transmission of this is actually outside the scope
>>>>>>> of
>>>>>>>>>> SASL so we have to specify this. Two proposals
>>>>>>>>>> 
>>>>>>>>>> Original Proposal: Add a new "authenticate" request/response
>>>>>>>>>> 
>>>>>>>>>> The proposal in the original wiki was to add a new "authenticate"
>>>>>>>>>> request/response to pass this information. This matches what was
>>>>>>> done
>>>>>>>>>> in the kerberos implementation for zookeeper. The intention is that
>>>>>>>>>> the client would send this request immediately after establishing a
>>>>>>>>>> connection, in which case it acts much like a "handshake", however
>>>>>>>>>> there is no requirement that they do so.
>>>>>>>>>> 
>>>>>>>>>> Whether the authentication happens via SSL or via Kerberos, the
>>>>>>> effect
>>>>>>>>>> will just be to set the username in their session. This will
>>>>>>> default
>>>>>>>>>> to the "anybody" user. So in the default non-secure case we will
>>>>>>> just
>>>>>>>>>> be defaulting "anybody" to have full permission. So to answer the
>>>>>>>>>> question about whether changing user is required or not, I don't
>>>>>>> think
>>>>>>>>>> it is but I think we kind of get it for free in this approach.
>>>>>>>>>> 
>>>>>>>>>> In this approach there is no particular need or advantage to
>>>>>>> having a
>>>>>>>>>> separate port for kerberos I don't think.
>>>>>>>>>> 
>>>>>>>>>> Alternate Proposal: Create a Handshake
>>>>>>>>>> 
>>>>>>>>>> The alternative I think Michael was proposing was to create a
>>>>>>>>>> handshake that would happen at connection time on connections
>>>>>>> coming
>>>>>>>>>> in on the SASL port. This would require a separate port for SASL
>>>>>>> since
>>>>>>>>>> otherwise you wouldn't be able to tell if the bytes you were
>>>>>>> getting
>>>>>>>>>> were for SASL or were the first request of an unauthenticated
>>>>>>>>>> connection.
>>>>>>>>>> 
>>>>>>>>>> Michael it would be good to work out the details of how this works.
>>>>>>>>>> Are we just sending size-delimited byte arrays back and forth until
>>>>>>>>>> the challenge response terminates?
>>>>>>>>>> 
>>>>>>>>>> My Take
>>>>>>>>>> 
>>>>>>>>>> The pro I see for Michael's proposal is that it keeps the
>>>>>>>>>> authentication logic more localized in the socket server.
>>>>>>>>>> 
>>>>>>>>>> I see two cons:
>>>>>>>>>> 1. Since the handshake won't go through the normal api layer it
>>>>>>> won't
>>>>>>>>>> go through the normal logging (e.g. request log), jmx monitoring,
>>>>>>>>>> client trace token, correlation id, etc that we get for other
>>>>>>>>>> requests. This could make operations a little confusing and make
>>>>>>>>>> debugging a little harder since the client will be blocking on
>>>>>>> network
>>>>>>>>>> requests without the normal logging.
>>>>>>>>>> 2. This part of the protocol will be inconsistent with the rest of
>>>>>>> the
>>>>>>>>>> Kafka protocol so it will be a little odd for client implementors
>>>>>>> as
>>>>>>>>>> this will effectively be a request/response that they will have to
>>>>>>>>>> implement that will be different from all the other
>>>>>>> request/responses
>>>>>>>>>> they implement.
>>>>>>>>>> 
>>>>>>>>>> In practice these two alternatives are not very different except
>>>>>>> that
>>>>>>>>>> in the original proposal the bytes you send are prefixed by the
>>>>>>> normal
>>>>>>>>>> request header fields such as the client id, correlation id, etc.
>>>>>>>>>> Overall I would prefer this as I think it is a bit more consistent
>>>>>>>>>> from the client's point of view.
>>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> 
>>>>>>>>>> -Jay
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
> 

Reply via email to