Very interesting and some of this definitely needs to be done. I need
to spend some more time with it from a DSL syntax POV. One immediate
issue is with the
Hdfs("sample")
construct. Hdfs and other command suites are basically static factory
classes for commands. They can't take a parameters like this without say
new Hdfs("sample")
but I think that will cause problems and require syntax like this.
(new Hdfs("sample")).ls(session).now()
not very pretty.
I don't understand why a single session can't be for one and only one
cluster.
On 3/22/13 12:05 PM, larry mccay wrote:
I spent some time with knoxshell in working through the SSO workflow
and the current shell syntax.
The following is just a rework to accommodate some of what we have
been discussing wrt OAuth and SSO and some minor changes that felt a
little better to me.
I have no idea whether the fluent style of the creds and scopes that I
add here are even valid but I think you'll get the idea.
What's important to me is the concept of providing the login machinery
the following info:
- What credentials the client has available
- The intended use of the session
The login should negotiate the most appropriate IdP, present the
required credentials and ask the Knox token endpoint for an
authorization token.
The token endpoint will need to then interrogate gateway policy for
the scopes requested and issue or deny an authorization token to the
session.
The resulting authenticated identity should then be assured that they
have access to the services they requested unless authorization policy
has changed at the gateway or finer grained policy enforcement at the
service perimeters deny it.
-------------------------------
import org.apache.hadoop.gateway.shell.Knox
import org.apache.hadoop.gateway.shell.Creds
import org.apache.hadoop.gateway.shell.Clusters
import org.apache.hadoop.gateway.shell.hdfs.Hdfs
import groovy.json.JsonSlurper
knoxurl = "https://localhost:8443/gateway"
creds = Creds.basic().username().alias("u").password().alias("p")
scopes =
Scopes.cluster.add("sample").services.add("hdfs").cluster.add("sample2").services.add("hdfs")
session = Knox.login( knoxurl, creds, scopes)
Hdfs("sample").rm( session ).file( "/tmp/example" ).recursive().now()
readmeFuture =
Hdfs("sample").put(session).file("README").to("tmp/example/README").later()
{
println it.statusCode
text = Hdfs("sample").ls( session ).dir( "/tmp/example" ).now().string
results = (new JsonSlurper()).parseText( text )
println results.FileStatuses.FileStatus.pathSuffix
}
session.waitFor( readmeFuture )
session.shutdown()
-------------------------------
So…
* Introduce the notion of a knox session rather than the Hadoop thing
(Hadoop met the unified product goal for Hadoop better but divorced
the shell from Knox not sure which is better)
* Do not login to a particular cluster through the gateway but to knox
itself - the login command will need to be aware of the token endpoint
on the gateway instance
* We can provide the SSO login mechanisms with the types of
credentials available to the client and the means to get them
* We also provide the set of hadoop cluster names and services that we
intend to access
* Login to Knox and get a session
* Qualify service calls (when there are more than one) with the cluster name
* changed the json variable to a results variable - seems to read better to me
* have the session wait for the future
* shutdown the session - hadoop.shutdown looked very scary to me