Re: Two open issues on Kafka security

Jay Kreps Wed, 01 Oct 2014 21:53:34 -0700

Here is the client side in ZK:
https://svn.apache.org/repos/asf/zookeeper/trunk/src/java/main/org/apache/zookeeper/client/ZooKeeperSaslClient.java


Note how they have a special Zookeeper request API that is used to
send the SASL bytes (e.g. see ZooKeeperSaslClient.sendSaslPacket).

This API follows the same protocol and rpc mechanism all their other
request/response types follow but it just has a simple byte[] entry
for the SASL token in both the request and response.

-Jay

On Wed, Oct 1, 2014 at 9:46 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
> Hey Michael,
>
> WRT question 2, I think for SASL you do need the mechanism information
> but what I was talking about was the challenge/response byte[] that is
> sent back and forth from the client to the server. My understanding is
> that SASL gives you an api for the client and server to use to produce
> these byte[]'s but doesn't actually specify any way of exchanging them
> (that is protocol specific). I could be wrong here since my knowledge
> of this stuff is pretty weak. But according to my understanding you
> must be imagining some protocol for exchanging challenge/response
> information. This protocol would have to be clearly documented for
> client implementors. What is that protocol?
>
> -Jay
>
> On Wed, Oct 1, 2014 at 2:36 PM, Michael Herstine
> <mherst...@linkedin.com.invalid> wrote:
>> Regarding question #1, I’m not sure I follow you, Joe: you’re proposing (I
>> think) that the API take a byte[], but what will be in that array? A
>> serialized certificate if the client authenticated via SSL and the
>> principal name (perhaps normalized) if the client authenticated via
>> Kerberos?
>>
>> Regarding question #2, I think I was unclear in the meeting yesterday: I
>> was proposing a separate port for each authentication method (including
>> none). That is, if a client wants no authentication, then they would
>> connect to port N on the broker. If they wanted to talk over SSL, then
>> they connect to port N+1 (say). Kerberos: N+2. This would remove the need
>> for a new request, since the authentication type would be implicit in the
>> port on which the client connected (and it was my understanding that it
>> was desirable to not introduce any new messages).
>>
>> Perhaps the confusion comes from the fact, correctly pointed out by Jay,
>> that when you want to use SASL on a single port, there does of course need
>> to be a way for the incoming client to signal which mechanism it wants to
>> use, and that’s out of scope of the SASL spec. I didn’t see there being a
>> desire to add new SASL mechanisms going forward, but perhaps I was
>> incorrect?
>>
>> In any event, I’d like to suggest we keep the “open” or “no auth” port
>> separate, both to make it easy for admins to force the use of security (by
>> shutting down that port) and to avoid downgrade attacks (where an attacker
>> intercepts the opening packet from a client requesting security & alters
>> it to request none).
>>
>> I’ll update the Wiki with my notes from yesterday’s meeting this afternoon.
>>
>> Thanks,
>>
>> On 10/1/14, 9:35 AM, "Jonathan Creasy" <jonathan.cre...@turn.com> wrote:
>>
>>>This is not nearly as deep as the discussion so far, but I did want to
>>>throw this idea out there to make sure we¹ve thought about it.
>>>
>>>The Kafka project should make sure that when deployed alongside a Hadoop
>>>cluster from any major distributions that it can tie seamlessly into the
>>>authentication and authorization used within that cluster. For example,
>>>Apache Sentry.
>>>
>>>This may present additional difficulties that means a decision is made to
>>>not do that or alternatively the Kerberos authentication and the
>>>authorization schemes we are already working on may be sufficient.
>>>
>>>I¹m not sure that anything I¹ve read so far in this discussion actually
>>>poses a problem, but I¹m an Ops guy and being able to more easily
>>>integrate more things, makes my life better. :)
>>>
>>>-Jonathan
>>>
>>>On 9/30/14, 11:26 PM, "Joe Stein" <joe.st...@stealth.ly> wrote:
>>>
>>>>inline
>>>>
>>>>On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
>>>>
>>>>> Hey Joe,
>>>>>
>>>>> For (1) what are you thinking for the PermissionManager api?
>>>>>
>>>>> The way I see it, the first question we have to answer is whether it
>>>>> is possible to make authentication and authorization independent. What
>>>>> I mean by that is whether I can write an authorization library that
>>>>> will work the same whether you authenticate with ssl or kerberos.
>>>>
>>>>
>>>>To me that is a requirement. We can't tie them together.  We have to
>>>>provide the ability for authorization to work regardless of the
>>>>authentication.  One *VERY* important use case is level of trust in
>>>>authentication from the authorization perpsective.  e.g. I authorize
>>>>"identity" based on the how you authenticated.... Alice is able to view
>>>>topic X if Alice authenticated over kerberos.  Bob isn't allowed to view
>>>>topic X no matter what. Alice can authenticate over not kerberos (uses
>>>>cases for that) and in that case Alice wouldn't see topic X.  A concrete
>>>>use case for this with Kafka would be a third party bank consuming data
>>>>to
>>>>a broker.  The service provider would have some kerberos local auth for
>>>>that bank to-do back up that would also have access to other topics
>>>>related
>>>>to that banks data.... the bank itself over SSL wants a stream of events
>>>>(some specific topic) and that banks identity only sees that topic.  It
>>>>is
>>>>important to not confuse identity, authentication and authorization.
>>>>
>>>>
>>>>> If
>>>>> so then we need to pick some subset of identity information that we
>>>>> can extract from both and have this constitute the identity we pass
>>>>> into the authorization interface. The original proposal had just the
>>>>> username/subject. But maybe we should add the ip address as well as
>>>>> that is useful. What I would prefer not to do is add everything in the
>>>>> certificate. I think the assumption is that you are generating these
>>>>> certificates for Kafka so you can put whatever identity info you want
>>>>> in the Subject Alternative Name. If that is true then just using that
>>>>> should be okay, right?
>>>>>
>>>>
>>>>I think we should just push the byte[] and let the plugin deal with it.
>>>>So, if we have a certificate object then pass that along with whatever
>>>>other meta data (e.g. IP address of client) we can.  I don't think we
>>>>should do any parsing whatsover and let the plugin deal with that.  Any
>>>>parsing we do on the identity information for the "security object"
>>>>forces
>>>>us into specific implementations and I don't see any reason to-do that...
>>>>If plug-ins want an "easier" time to deal with certs and parsing and blah
>>>>blah blah then we can implement some way they can do this without much
>>>>fuss.... we also need to make sure that crypto library is plugable too
>>>>(so
>>>>we can expose an API for them to call) so that HSM can be easily dropped
>>>>in
>>>>without Kafka caring... so in the plugin we could provide a
>>>>indentity.getAlternativeAttribute() and then that use case is solved (and
>>>>we can use bouncy castle or whatever to parse it for them to make it
>>>>easier).... and always give them raw bytes so they could do it
>>>>themselves.
>>>>
>>>>
>>>>>
>>>>> -Jay
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 30, 2014 at 4:09 PM, Joe Stein <joe.st...@stealth.ly>
>>>>>wrote:
>>>>> > 1) We need to support the most flexibility we can and make this
>>>>> transparent
>>>>> > to kafka (to use Gwen's term).  Any specific implementation is going
>>>>>to
>>>>> > make it not work with some solution stopping people from using Kafka.
>>>>> That
>>>>> > is a reality because everyone just does it slightly differently
>>>>>enough.
>>>>> If
>>>>> > we have an "identity" byte structure (lets not use string because
>>>>>some
>>>>> > security objects are bytes) this should just fall through to the
>>>>> > implementor.  For certs this is the entire x509 object (not just the
>>>>> > certificate part as it could contain an ASN.1 timestamp) and inside
>>>>>you
>>>>> > parse and do what you want with it.
>>>>> >
>>>>> > 2) While I think there are many benefits to just the handshake
>>>>>approach I
>>>>> > don't think it outweighs the cons Jay expressed. a) We can't lead the
>>>>> > client libraries down a new path of interacting with Kafka.  By
>>>>> > incrementally adding to the wire protocol we are directing a very
>>>>>clear
>>>>> and
>>>>> > expect ted approach.  We already have issues with implementation even
>>>>> with
>>>>> > the wire protocol in place and are trying to improve that aspect of
>>>>>the
>>>>> > community as a whole.  Lets not take a step backwards with this
>>>>>there...
>>>>> > also we need to not add more/different hoops to
>>>>> > debugging/administering/monitoring kafka so taking advantage (as Jay
>>>>> says)
>>>>> > of built in logging (etc) is important... also for the client
>>>>>librariy
>>>>> > developers too :)
>>>>> >
>>>>> > On Tue, Sep 30, 2014 at 6:44 PM, Gwen Shapira <gshap...@cloudera.com>
>>>>> wrote:
>>>>> >
>>>>> >> Re #1:
>>>>> >>
>>>>> >> Since the auth_to_local is a kerberos config, its up to the admin to
>>>>> >> decide how he likes the user names and set it up properly (or leave
>>>>> >> empty) and make sure the ACLs match. Simplified names may be needed
>>>>>if
>>>>> >> the authorization system integrates with LDAP to get groups or
>>>>> >> something fancy like that.
>>>>> >>
>>>>> >> Note that its completely transparent to Kafka - if the admin sets up
>>>>> >> auth_to_local rules, we simply see a different principal name. No
>>>>>need
>>>>> >> to do anything different.
>>>>> >>
>>>>> >> Gwen
>>>>> >>
>>>>> >> On Tue, Sep 30, 2014 at 3:31 PM, Jay Kreps <jay.kr...@gmail.com>
>>>>>wrote:
>>>>> >> > Current proposal is here:
>>>>> >> >
>>>>> >> > https://cwiki.apache.org/confluence/display/KAFKA/Security
>>>>> >> >
>>>>> >> > Here are the two open questions I am aware of:
>>>>> >> >
>>>>> >> > 1. We want to separate authentication and authorization. This
>>>>>means
>>>>> >> > permissions will be assigned to some user-like
>>>>>subject/entity/person
>>>>> >> > string that is independent of the authorization mechanism. It
>>>>>sounds
>>>>> >> > like we agreed this could be done and we had in mind some
>>>>>krb-specific
>>>>> >> > mangling that Gwen knew about and I think the plan was to use
>>>>>whatever
>>>>> >> > the user chose to put in the Subject Alternative Name of the cert
>>>>>for
>>>>> >> > ssl. So in both cases these would translate to a string denoting
>>>>>the
>>>>> >> > entity whom we are granting permissions to in the authorization
>>>>>layer.
>>>>> >> > We should document these in the wiki to get feedback on them.
>>>>> >> >
>>>>> >> > The Hadoop approach to extraction was something like this:
>>>>> >> >
>>>>> >>
>>>>>
>>>>>http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_ma
>>>>>n
>>>>>ually_book/content/rpm-chap14-2-3-1.html
>>>>> >> >
>>>>> >> > But actually I'm not sure if just using the full kerberos
>>>>>principal is
>>>>> >> > so bad? I.e. having the user be jenni...@athena.mit.edu versus
>>>>>just
>>>>> >> > jennifer. Where this would make a difference would be in a case
>>>>>where
>>>>> >> > you wanted the same user/entity to be able to authenticate via
>>>>> >> > different mechanisms (Hadoop auth, kerberos, ssl) and have a
>>>>>single
>>>>> >> > set of permissions.
>>>>> >> >
>>>>> >> > 2. For SASL/Kerberos we need to figure out how the communication
>>>>> >> > between client and server will be handled to pass the
>>>>> >> > challenge/response byte[]. I.e.
>>>>> >> >
>>>>> >> >
>>>>> >>
>>>>>
>>>>>http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.
>>>>>h
>>>>>tml#evaluateChallenge(byte[])
>>>>> >> >
>>>>> >>
>>>>>
>>>>>http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.
>>>>>h
>>>>>tml#evaluateResponse(byte[])
>>>>> >> >
>>>>> >> > I am not super expert in this area but I will try to give my
>>>>> >> > understanding and I'm sure someone can correct me if I am
>>>>>confused.
>>>>> >> >
>>>>> >> > Unlike SSL the transmission of this is actually outside the scope
>>>>>of
>>>>> >> > SASL so we have to specify this. Two proposals
>>>>> >> >
>>>>> >> > Original Proposal: Add a new "authenticate" request/response
>>>>> >> >
>>>>> >> > The proposal in the original wiki was to add a new "authenticate"
>>>>> >> > request/response to pass this information. This matches what was
>>>>>done
>>>>> >> > in the kerberos implementation for zookeeper. The intention is
>>>>>that
>>>>> >> > the client would send this request immediately after establishing
>>>>>a
>>>>> >> > connection, in which case it acts much like a "handshake", however
>>>>> >> > there is no requirement that they do so.
>>>>> >> >
>>>>> >> > Whether the authentication happens via SSL or via Kerberos, the
>>>>>effect
>>>>> >> > will just be to set the username in their session. This will
>>>>>default
>>>>> >> > to the "anybody" user. So in the default non-secure case we will
>>>>>just
>>>>> >> > be defaulting "anybody" to have full permission. So to answer the
>>>>> >> > question about whether changing user is required or not, I don't
>>>>>think
>>>>> >> > it is but I think we kind of get it for free in this approach.
>>>>> >> >
>>>>> >> > In this approach there is no particular need or advantage to
>>>>>having a
>>>>> >> > separate port for kerberos I don't think.
>>>>> >> >
>>>>> >> > Alternate Proposal: Create a Handshake
>>>>> >> >
>>>>> >> > The alternative I think Michael was proposing was to create a
>>>>> >> > handshake that would happen at connection time on connections
>>>>>coming
>>>>> >> > in on the SASL port. This would require a separate port for SASL
>>>>>since
>>>>> >> > otherwise you wouldn't be able to tell if the bytes you were
>>>>>getting
>>>>> >> > were for SASL or were the first request of an unauthenticated
>>>>> >> > connection.
>>>>> >> >
>>>>> >> > Michael it would be good to work out the details of how this
>>>>>works.
>>>>> >> > Are we just sending size-delimited byte arrays back and forth
>>>>>until
>>>>> >> > the challenge response terminates?
>>>>> >> >
>>>>> >> > My Take
>>>>> >> >
>>>>> >> > The pro I see for Michael's proposal is that it keeps the
>>>>> >> > authentication logic more localized in the socket server.
>>>>> >> >
>>>>> >> > I see two cons:
>>>>> >> > 1. Since the handshake won't go through the normal api layer it
>>>>>won't
>>>>> >> > go through the normal logging (e.g. request log), jmx monitoring,
>>>>> >> > client trace token, correlation id, etc that we get for other
>>>>> >> > requests. This could make operations a little confusing and make
>>>>> >> > debugging a little harder since the client will be blocking on
>>>>>network
>>>>> >> > requests without the normal logging.
>>>>> >> > 2. This part of the protocol will be inconsistent with the rest of
>>>>>the
>>>>> >> > Kafka protocol so it will be a little odd for client implementors
>>>>>as
>>>>> >> > this will effectively be a request/response that they will have to
>>>>> >> > implement that will be different from all the other
>>>>>request/responses
>>>>> >> > they implement.
>>>>> >> >
>>>>> >> > In practice these two alternatives are not very different except
>>>>>that
>>>>> >> > in the original proposal the bytes you send are prefixed by the
>>>>>normal
>>>>> >> > request header fields such as the client id, correlation id, etc.
>>>>> >> > Overall I would prefer this as I think it is a bit more consistent
>>>>> >> > from the client's point of view.
>>>>> >> >
>>>>> >> > Cheers,
>>>>> >> >
>>>>> >> > -Jay
>>>>> >>
>>>>>
>>>
>>

Re: Two open issues on Kafka security

Reply via email to