Thanks, Brian!
Look at that - the power of collaboration - the numbering is correct already! 
;-)

I am inclined to agree that we should start with the Hadoop SSO Tokens and am 
leaning toward a new jira that leaves behind the cruft but I don't feel very 
strongly about it being new.
I do feel like, especially given Kai's new document, that we have only one.

On Jul 3, 2013, at 2:32 PM, Brian Swan <brian.s...@microsoft.com> wrote:

> Thanks, Larry, for starting this conversation (and thanks for the great 
> Summit meeting summary you sent out a couple of days ago). To weigh in on 
> your specific discussion points (and renumber them :-))...
> 
> 1. Are there additional components that would be required for a Hadoop SSO 
> service?
> Not that I can see.
> 
> 2. Should any of the above described components be considered not actually 
> necessary or poorly described?
> I think this will be determined as we get into the details of each component. 
> What you've described here is certainly an excellent starting point.
> 
> 3. Should we create a new umbrella Jira to identify each of these as a 
> subtask?
> 4. Should we just continue to use 9533 for the SSO server and add additional 
> subtasks?
> What is described here seem to fit with 9533, though 9533 may contain some 
> details that need further discussion. IMHO, it may be better to file a new 
> umbrella Jira, though I'm not 100% convinced of that. Would be very 
> interested on input from others.
> 
> 5. What are the natural seams of separation between these components and any 
> dependencies between one and another that affect priority?
> Is 4 the right place to start? (4. Hadoop SSO Tokens: the exact shape and 
> form of the sso tokens...) It seemed that in some 1:1 conversations after the 
> Summit meeting that others may agree with this. Would like to hear if that is 
> the case more broadly.
> 
> -Brian
> 
> -----Original Message-----
> From: Larry McCay [mailto:lmc...@hortonworks.com] 
> Sent: Tuesday, July 2, 2013 1:04 PM
> To: common-dev@hadoop.apache.org
> Subject: [DISCUSS] Hadoop SSO/Token Server Components
> 
> All -
> 
> As a follow up to the discussions that were had during Hadoop Summit, I would 
> like to introduce the discussion topic around the moving parts of a Hadoop 
> SSO/Token Service.
> There are a couple of related Jira's that can be referenced and may or may 
> not be updated as a result of this discuss thread.
> 
> https://issues.apache.org/jira/browse/HADOOP-9533
> https://issues.apache.org/jira/browse/HADOOP-9392
> 
> As the first aspect of the discussion, we should probably state the overall 
> goals and scoping for this effort:
> * An alternative authentication mechanism to Kerberos for user authentication
> * A broader capability for integration into enterprise identity and SSO 
> solutions
> * Possibly the advertisement/negotiation of available authentication 
> mechanisms
> * Backward compatibility for the existing use of Kerberos
> * No (or minimal) changes to existing Hadoop tokens (delegation, job, block 
> access, etc)
> * Pluggable authentication mechanisms across: RPC, REST and webui enforcement 
> points
> * Continued support for existing authorization policy/ACLs, etc
> * Keeping more fine grained authorization policies in mind - like attribute 
> based access control
>       - fine grained access control is a separate but related effort that we 
> must not preclude with this effort
> * Cross cluster SSO
> 
> In order to tease out the moving parts here are a couple high level and 
> simplified descriptions of SSO interaction flow:
>                               +------+
>       +------+ credentials 1 | SSO  |
>       |CLIENT|-------------->|SERVER|
>       +------+  :tokens      +------+
>         2 |                    
>           | access token
>           V :requested resource
>       +-------+
>       |HADOOP |
>       |SERVICE|
>       +-------+
>       
> The above diagram represents the simplest interaction model for an SSO 
> service in Hadoop.
> 1. client authenticates to SSO service and acquires an access token
>  a. client presents credentials to an authentication service endpoint exposed 
> by the SSO server (AS) and receives a token representing the authentication 
> event and verified identity
>  b. client then presents the identity token from 1.a. to the token endpoint 
> exposed by the SSO server (TGS) to request an access token to a particular 
> Hadoop service and receives an access token 2. client presents the Hadoop 
> access token to the Hadoop service for which the access token has been 
> granted and requests the desired resource or services
>  a. access token is presented as appropriate for the service endpoint 
> protocol being used
>  b. Hadoop service token validation handler validates the token and verifies 
> its integrity and the identity of the issuer
> 
>    +------+
>    |  IdP |
>    +------+
>    1   ^ credentials
>        | :idp_token
>        |                      +------+
>       +------+  idp_token  2 | SSO  |
>       |CLIENT|-------------->|SERVER|
>       +------+  :tokens      +------+
>         3 |                    
>           | access token
>           V :requested resource
>       +-------+
>       |HADOOP |
>       |SERVICE|
>       +-------+
>       
> 
> The above diagram represents a slightly more complicated interaction model 
> for an SSO service in Hadoop that removes Hadoop from the credential 
> collection business.
> 1. client authenticates to a trusted identity provider within the enterprise 
> and acquires an IdP specific token
>  a. client presents credentials to an enterprise IdP and receives a token 
> representing the authentication identity 2. client authenticates to SSO 
> service and acquires an access token
>  a. client presents idp_token to an authentication service endpoint exposed 
> by the SSO server (AS) and receives a token representing the authentication 
> event and verified identity
>  b. client then presents the identity token from 2.a. to the token endpoint 
> exposed by the SSO server (TGS) to request an access token to a particular 
> Hadoop service and receives an access token 3. client presents the Hadoop 
> access token to the Hadoop service for which the access token has been 
> granted and requests the desired resource or services
>  a. access token is presented as appropriate for the service endpoint 
> protocol being used
>  b. Hadoop service token validation handler validates the token and verifies 
> its integrity and the identity of the issuer
>       
> Considering the above set of goals and high level interaction flow 
> description, we can start to discuss the component inventory required to 
> accomplish this vision:
> 
> 1. SSO Server Instance: this component must be able to expose endpoints for 
> both authentication of users by collecting and validating credentials and 
> federation of identities represented by tokens from trusted IdPs within the 
> enterprise. The endpoints should be composable so as to allow for multifactor 
> authentication mechanisms. They will also need to return tokens that 
> represent the authentication event and verified identity as well as access 
> tokens for specific Hadoop services.
> 
> 2. Authentication Providers: pluggable authentication mechanisms must be 
> easily created and configured for use within the SSO server instance. They 
> will ideally allow the enterprise to plugin their preferred components from 
> off the shelf as well as provide custom providers. Supporting existing 
> standards for such authentication providers should be a top priority concern. 
> There are a number of standard approaches in use in the Java world: JAAS 
> loginmodules, servlet filters, JASPIC authmodules, etc. A pluggable provider 
> architecture that allows the enterprise to leverage existing investments in 
> these technologies and existing skill sets would be ideal.
> 
> 3. Token Authority: a token authority component would need to have the 
> ability to issue, verify and revoke tokens. This authority will need to be 
> trusted by all enforcement points that need to verify incoming tokens. Using 
> something like PKI for establishing trust will be required.
> 
> 4. Hadoop SSO Tokens: the exact shape and form of the sso tokens will need to 
> be considered in order to determine the means by which trust and integrity 
> are ensured while using them. There may be some abstraction of the underlying 
> format provided through interface based design but all token implementations 
> will need to have the same attributes and capabilities in terms of validation 
> and cryptographic verification.
> 
> 5. SSO Protocol: the lowest common denominator protocol for SSO server 
> interactions across client types would likely be REST. Depending on the REST 
> client in use it may require explicitly coding to the token flow described in 
> the earlier interaction descriptions or a plugin may be provided for things 
> like HTTPClient, curl, etc. RPC clients will have this taken care for them 
> within the SASL layer and will leverage the REST endpoints as well. This 
> likely implies trust requirements for the RPC client to be able to trust the 
> SSO server's identity cert that is presented over SSL. 
> 
> 6. REST Client Agent Plugins: required for encapsulating the interaction with 
> the SSO server for the client programming models. We may need these for many 
> client types: e.g. Java, JavaScript, .Net, Python, cURL etc.
> 
> 7. Server Side Authentication Handlers: the server side of the REST, RPC or 
> webui connection will need to be able to validate and verify the incoming 
> Hadoop tokens in order to grant or deny access to requested resources.
> 
> 8. Credential/Trust Management: throughout the system - on client and server 
> sides - we will need to manage and provide access to PKI and potentially 
> shared secret artifacts in order to establish the required trust 
> relationships to replace the mutual authentication that would be otherwise 
> provided by using kerberos everywhere.
> 
> So, discussion points:
> 
> 1. Are there additional components that would be required for a Hadoop SSO 
> service?
> 2. Should any of the above described components be considered not actually 
> necessary or poorly described?
> 2. Should we create a new umbrella Jira to identify each of these as a 
> subtask?
> 3. Should we just continue to use 9533 for the SSO server and add additional 
> subtasks?
> 4. What are the natural seams of separation between these components and any 
> dependencies between one and another that affect priority?
> 
> Obviously, each component that we identify will have a jira of its own - more 
> than likely - so we are only trying to identify the high level descriptions 
> for now.
> 
> Can we try and drive this discussion to a close by the end of the week? This 
> will allow us to start breaking out into component implementation plans.
> 
> thanks,
> 
> --larry
> 

Reply via email to