[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294623#comment-14294623
 ] 

Josh Elser commented on ACCUMULO-3513:
--------------------------------------

bq. At some point, I think that's the best we can get. We cannot get direct 
access to the client's credentials, so we must trust another party (in this 
case, the MapReduce servers).

Right, I agree with you here, of course, but we still need some way to control 
when non-strongly-authenticated users (w/o kerberos credentials) try to connect 
to Accumulo. That's the crux of what we need to solve to make MapReduce 
actually work.

bq. We could require that the clients authenticate to Accumulo to generate a 
shared secret (really, though, they just need to authenticate to the 
Authenticator implementation backing Accumulo). This is analogous to the HDFS 
delegation token. The client can then give this shared secret to the MapReduce 
layer to use when talking to Accumulo, to ensure that the client did actually 
hand that secret to the MapReduce layer, requesting it to do work on its behalf

This is, like you say, ultimately what the delegation token boils down to and 
what I plan to do. Yes, we need to trust the ResourceManager to disallow users 
who have no credentials, but we still should have some shared secret support (a 
special token or data inside of a token) to prevent the need for additional 
configuration to just run MapReduce with Hadoop security on.

bq. However, we still need to designate the MapReduce layer as trustable in 
some way... because this layer could reuse one client's credentials to perform 
an unauthorized task and give the results to a different user

Yes, we still need to trust that MapReduce is keeping the shared secret safe 
which we know it does already. The ability to expire a shared secret gives us 
_some more_ confident that the shared secret won't be reused by some unwanted 
party. The yarn tasks themselves are run as the submitting user, so all we are 
relying on YARN to do is to set up a proper environment running as the client 
(to be clear, the actual unix user).

bq. The whitelist mechanism gives us some assurance that we've vetted that 
layer to not do those sorts of things.

We don't need a whitelist mechanism unless you're not trusting YARN itself 
which doesn't make any sense to me (which I think you already agree on)

> Ensure MapReduce functionality with Kerberos enabled
> ----------------------------------------------------
>
>                 Key: ACCUMULO-3513
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Blocker
>             Fix For: 1.7.0
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to