Thanks, Brian! Look at that - the power of collaboration - the numbering is correct already! ;-)
I am inclined to agree that we should start with the Hadoop SSO Tokens and am leaning toward a new jira that leaves behind the cruft but I don't feel very strongly about it being new. I do feel like, especially given Kai's new document, that we have only one. On Jul 3, 2013, at 2:32 PM, Brian Swan <[email protected]> wrote: > Thanks, Larry, for starting this conversation (and thanks for the great > Summit meeting summary you sent out a couple of days ago). To weigh in on > your specific discussion points (and renumber them :-))... > > 1. Are there additional components that would be required for a Hadoop SSO > service? > Not that I can see. > > 2. Should any of the above described components be considered not actually > necessary or poorly described? > I think this will be determined as we get into the details of each component. > What you've described here is certainly an excellent starting point. > > 3. Should we create a new umbrella Jira to identify each of these as a > subtask? > 4. Should we just continue to use 9533 for the SSO server and add additional > subtasks? > What is described here seem to fit with 9533, though 9533 may contain some > details that need further discussion. IMHO, it may be better to file a new > umbrella Jira, though I'm not 100% convinced of that. Would be very > interested on input from others. > > 5. What are the natural seams of separation between these components and any > dependencies between one and another that affect priority? > Is 4 the right place to start? (4. Hadoop SSO Tokens: the exact shape and > form of the sso tokens...) It seemed that in some 1:1 conversations after the > Summit meeting that others may agree with this. Would like to hear if that is > the case more broadly. > > -Brian > > -----Original Message----- > From: Larry McCay [mailto:[email protected]] > Sent: Tuesday, July 2, 2013 1:04 PM > To: [email protected] > Subject: [DISCUSS] Hadoop SSO/Token Server Components > > All - > > As a follow up to the discussions that were had during Hadoop Summit, I would > like to introduce the discussion topic around the moving parts of a Hadoop > SSO/Token Service. > There are a couple of related Jira's that can be referenced and may or may > not be updated as a result of this discuss thread. > > https://issues.apache.org/jira/browse/HADOOP-9533 > https://issues.apache.org/jira/browse/HADOOP-9392 > > As the first aspect of the discussion, we should probably state the overall > goals and scoping for this effort: > * An alternative authentication mechanism to Kerberos for user authentication > * A broader capability for integration into enterprise identity and SSO > solutions > * Possibly the advertisement/negotiation of available authentication > mechanisms > * Backward compatibility for the existing use of Kerberos > * No (or minimal) changes to existing Hadoop tokens (delegation, job, block > access, etc) > * Pluggable authentication mechanisms across: RPC, REST and webui enforcement > points > * Continued support for existing authorization policy/ACLs, etc > * Keeping more fine grained authorization policies in mind - like attribute > based access control > - fine grained access control is a separate but related effort that we > must not preclude with this effort > * Cross cluster SSO > > In order to tease out the moving parts here are a couple high level and > simplified descriptions of SSO interaction flow: > +------+ > +------+ credentials 1 | SSO | > |CLIENT|-------------->|SERVER| > +------+ :tokens +------+ > 2 | > | access token > V :requested resource > +-------+ > |HADOOP | > |SERVICE| > +-------+ > > The above diagram represents the simplest interaction model for an SSO > service in Hadoop. > 1. client authenticates to SSO service and acquires an access token > a. client presents credentials to an authentication service endpoint exposed > by the SSO server (AS) and receives a token representing the authentication > event and verified identity > b. client then presents the identity token from 1.a. to the token endpoint > exposed by the SSO server (TGS) to request an access token to a particular > Hadoop service and receives an access token 2. client presents the Hadoop > access token to the Hadoop service for which the access token has been > granted and requests the desired resource or services > a. access token is presented as appropriate for the service endpoint > protocol being used > b. Hadoop service token validation handler validates the token and verifies > its integrity and the identity of the issuer > > +------+ > | IdP | > +------+ > 1 ^ credentials > | :idp_token > | +------+ > +------+ idp_token 2 | SSO | > |CLIENT|-------------->|SERVER| > +------+ :tokens +------+ > 3 | > | access token > V :requested resource > +-------+ > |HADOOP | > |SERVICE| > +-------+ > > > The above diagram represents a slightly more complicated interaction model > for an SSO service in Hadoop that removes Hadoop from the credential > collection business. > 1. client authenticates to a trusted identity provider within the enterprise > and acquires an IdP specific token > a. client presents credentials to an enterprise IdP and receives a token > representing the authentication identity 2. client authenticates to SSO > service and acquires an access token > a. client presents idp_token to an authentication service endpoint exposed > by the SSO server (AS) and receives a token representing the authentication > event and verified identity > b. client then presents the identity token from 2.a. to the token endpoint > exposed by the SSO server (TGS) to request an access token to a particular > Hadoop service and receives an access token 3. client presents the Hadoop > access token to the Hadoop service for which the access token has been > granted and requests the desired resource or services > a. access token is presented as appropriate for the service endpoint > protocol being used > b. Hadoop service token validation handler validates the token and verifies > its integrity and the identity of the issuer > > Considering the above set of goals and high level interaction flow > description, we can start to discuss the component inventory required to > accomplish this vision: > > 1. SSO Server Instance: this component must be able to expose endpoints for > both authentication of users by collecting and validating credentials and > federation of identities represented by tokens from trusted IdPs within the > enterprise. The endpoints should be composable so as to allow for multifactor > authentication mechanisms. They will also need to return tokens that > represent the authentication event and verified identity as well as access > tokens for specific Hadoop services. > > 2. Authentication Providers: pluggable authentication mechanisms must be > easily created and configured for use within the SSO server instance. They > will ideally allow the enterprise to plugin their preferred components from > off the shelf as well as provide custom providers. Supporting existing > standards for such authentication providers should be a top priority concern. > There are a number of standard approaches in use in the Java world: JAAS > loginmodules, servlet filters, JASPIC authmodules, etc. A pluggable provider > architecture that allows the enterprise to leverage existing investments in > these technologies and existing skill sets would be ideal. > > 3. Token Authority: a token authority component would need to have the > ability to issue, verify and revoke tokens. This authority will need to be > trusted by all enforcement points that need to verify incoming tokens. Using > something like PKI for establishing trust will be required. > > 4. Hadoop SSO Tokens: the exact shape and form of the sso tokens will need to > be considered in order to determine the means by which trust and integrity > are ensured while using them. There may be some abstraction of the underlying > format provided through interface based design but all token implementations > will need to have the same attributes and capabilities in terms of validation > and cryptographic verification. > > 5. SSO Protocol: the lowest common denominator protocol for SSO server > interactions across client types would likely be REST. Depending on the REST > client in use it may require explicitly coding to the token flow described in > the earlier interaction descriptions or a plugin may be provided for things > like HTTPClient, curl, etc. RPC clients will have this taken care for them > within the SASL layer and will leverage the REST endpoints as well. This > likely implies trust requirements for the RPC client to be able to trust the > SSO server's identity cert that is presented over SSL. > > 6. REST Client Agent Plugins: required for encapsulating the interaction with > the SSO server for the client programming models. We may need these for many > client types: e.g. Java, JavaScript, .Net, Python, cURL etc. > > 7. Server Side Authentication Handlers: the server side of the REST, RPC or > webui connection will need to be able to validate and verify the incoming > Hadoop tokens in order to grant or deny access to requested resources. > > 8. Credential/Trust Management: throughout the system - on client and server > sides - we will need to manage and provide access to PKI and potentially > shared secret artifacts in order to establish the required trust > relationships to replace the mutual authentication that would be otherwise > provided by using kerberos everywhere. > > So, discussion points: > > 1. Are there additional components that would be required for a Hadoop SSO > service? > 2. Should any of the above described components be considered not actually > necessary or poorly described? > 2. Should we create a new umbrella Jira to identify each of these as a > subtask? > 3. Should we just continue to use 9533 for the SSO server and add additional > subtasks? > 4. What are the natural seams of separation between these components and any > dependencies between one and another that affect priority? > > Obviously, each component that we identify will have a jira of its own - more > than likely - so we are only trying to identify the high level descriptions > for now. > > Can we try and drive this discussion to a close by the end of the week? This > will allow us to start breaking out into component implementation plans. > > thanks, > > --larry >
