[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-02-04 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306160#comment-14306160
 ] 

Josh Elser commented on ACCUMULO-3513:
--

bq. Ugh, I missed that. Sorry. Now you need to grant YARN setuid privileges. 
That's... unfortunate. I suppose you also have to make assumptions about which 
UID you need to use, based on the content of the delegation token, too, and I 
guess there's no guarantee that this will even be the same on every node, or 
match the submitter's UID. (Though, presumably, they will all be the same if 
using some common login service, like AD on all the nodes.)

Yes, it is a pain to get YARN set up in secure mode (notably setuid stuff), but 
it is well written out what you need to do. It's also a stated YARN assumption 
that the user must exist on every node.

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
> Attachments: ACCUMULO-3513-design.pdf
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-02-04 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306151#comment-14306151
 ] 

Christopher Tubbs commented on ACCUMULO-3513:
-

bq. the YARN tasks run as the user who submitted the job

Ugh, I missed that. Sorry. Now you need to grant YARN setuid privileges. 
That's... unfortunate. I suppose you also have to make assumptions about which 
UID you need to use, based on the content of the delegation token, too, and I 
guess there's no guarantee that this will even be the same on every node, or 
match the submitter's UID. (Though, presumably, they will all be the same if 
using some common login service, like AD on all the nodes.)

bq. Why does the resource manager need to authenticate with Accumulo?

It doesn't *need* to. It'd just be a good idea if it did. We have no way to 
trust (vet/accredit/account for/log) the YARN layer. We don't know that it's 
actually YARN it could be some rogue process that hasn't been vetted. We 
lose the ability to mutually authenticate the service we are handing data to. 
It'd be really great if we didn't have to give that up. Granted, with regular 
passwords, we cannot do this either, but at least that security model and its 
risks are well-understood. We can try to think of something which would make 
this more secure than that.

bq. I'm not sure I understand what you mean here: No user code is being run 
with YARN's credentials.

Yes, I know this is how it works. I'm simply describing the competing goal. 
YARN is implemented this way to make it impossible for tasks to use the node's 
own credentials, but that's precisely what would be useful for Accumulo so it 
knew that the requester was the trusted YARN layer.

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
> Attachments: ACCUMULO-3513-design.pdf
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-02-04 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306073#comment-14306073
 ] 

Josh Elser commented on ACCUMULO-3513:
--

bq. What user does the task run as? If the effective UID is the same as its 
parent, the filesystem won't protect it.

Pretty sure I covered this already: the YARN tasks run as the user who 
submitted the job. This requires that your user exists across your YARN node 
managers. Thus, it is not the same effective UID, it's an entirely different 
one.

bq. If only the ResourceManager and the client could authenticate with Accumulo 
first

Why does the resource manager need to authenticate with Accumulo? The user 
needs to trust that the YARN cluster they're talking to is "real" (and not some 
third party that is somehow masquerading as a YARN cluster). If a user is just 
submitting their credentials to anyone who listens, the problem is with that 
user and not something we can solve with Accumulo.

bq. MapReduce needs to avoid granting access to its credentials from an 
untrusted client (which Accumulo does trust)

I'm not sure I understand what you mean here: No user code is being run with 
YARN's credentials. YARN tasks could be run by users who don't have Accumulo 
"accounts", but just being able to run a YARN job, doesn't mean they can 
authenticate with Accumulo (with a delegation token that was obtained with real 
credentials).

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
> Attachments: ACCUMULO-3513-design.pdf
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-02-04 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306061#comment-14306061
 ] 

Christopher Tubbs commented on ACCUMULO-3513:
-

{quote}Keytabs on disk should be protected by the filesystem. ...  A little C 
program ... drops permissions ...{quote}

What user does the task run as? If the effective UID is the same as its parent, 
the filesystem won't protect it.

{quote}... it's expected that the delegation token is protected from prying 
eyes ...{quote}

There seems to be a trade-off here, with competing goals. On the one hand, we 
need to make sure Accumulo doesn't give up data to an untrusted middle-man. 
And, on the other hand, MapReduce needs to avoid granting access to its 
credentials from an untrusted client (which Accumulo *does* trust).

If only the ResourceManager *and* the client could authenticate with Accumulo 
first, then we could carry information about both of these things in the token 
used to authenticate to Accumulo in the actual task, then we could trust the 
middle-man (YARN task) *and* the client to be able to receive the data from 
Accumulo.

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
> Attachments: ACCUMULO-3513-design.pdf
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-02-04 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306004#comment-14306004
 ] 

Josh Elser commented on ACCUMULO-3513:
--

bq. Oh. Interesting. So, the YARN process can securely authenticate itself with 
the job controller (NodeManager? I'm not sure terminology here) before a job is 
submitted, but the task doesn't have access to that.

ResourceManager, but yes, I think you have the point. 

bq. How do they prevent the tasks from getting access to the parent process' 
Kerberos keytab?

So, it's an entirely new process, so there's no shared memory. Keytabs on disk 
should be protected by the filesystem.

bq. How are these tasks sandboxed?

A little C program is executed by the nodemanager which does your normal 
fork(), drops permissions on the child process, and runs the actual yarn task.

bq. Could our Input/OutputFormat be configured to access this keytab?

No, for the above reason -- we cannot read it. If it was generally open, anyone 
could impersonate the yarn processes.

bq. I guess you might not want to do that if you don't trust the job which was 
submitted, but I'm not sure how we (Accumulo services) can trust that the 
request is coming from a trusted YARN service, and not some other party which 
maliciously gained access to a client's delegation token.

Like any password, it's expected that the delegation token is protected from 
prying eyes. The time-limit on the validity of the delegation token helps 
mitigate some concern, but that's a very small mitigation. We ultimately need 
to rely on YARN (which it is doing) to keep the delegation token safe from 
prying eyes from when it leaves the client's possession and makes it way to the 
actual yarn task.

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
> Attachments: ACCUMULO-3513-design.pdf
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-02-04 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305989#comment-14305989
 ] 

Christopher Tubbs commented on ACCUMULO-3513:
-

{quote}YARN processes have kerberos principals and credentials, but the tasks 
they spawn do not.{quote}

Oh. Interesting. So, the YARN process can securely authenticate itself with the 
job controller (NodeManager? I'm not sure terminology here) before a job is 
submitted, but the task doesn't have access to that. How do they prevent the 
tasks from getting access to the parent process' Kerberos keytab? How are these 
tasks sandboxed? Could our Input/OutputFormat be configured to access this 
keytab? I guess you might not want to do that if you don't trust the job which 
was submitted, but I'm not sure how we (Accumulo services) can trust that the 
request is coming from a trusted YARN service, and not some other party which 
maliciously gained access to a client's delegation token.

{quote}This would require us have clients hold onto N delegation tokens 
though.{quote}

No, there'd still only be one delegation token in play, but whoever generated 
it might change. I'm suggesting instead of a global, fixed "leader" involving 
coordination, a random "leader" is selected for each delegation token.

{quote}You need the coordination to roll new secret keys. Using the same secret 
key for months (assuming average uptime of a cluster) is just asking for 
attacks.{quote}

That's not what I was suggesting. I was suggesting eliminating the need to 
coordinate between servers by making one server responsible for each token 
(corresponding to a temporary key stored within that tserver).

{quote}Code will speak better than I can:...{quote}

Cool. Will take a look.

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
> Attachments: ACCUMULO-3513-design.pdf
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-02-04 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305797#comment-14305797
 ] 

Josh Elser commented on ACCUMULO-3513:
--

I haven't really read up about DIGEST-MD5. I'll have to look into that and see 
if there's anything better we can use with SASL.

bq The individual MapReduce nodes do not have Kerberos principals at all? How 
do they authenticate to the job controller?

Delegation tokens.

bq. you have to talk to the TServer which issued it

This would require us have clients hold onto N delegation tokens though. That'd 
make the client implementation much more difficult than a singular delegation 
token that any node in the instance can verify.

bq. If you use a single shared key, you really don't need leader election 
(because they all have the secret and perform the same function)

You need the coordination to roll new secret keys. Using the same secret key 
for months (assuming average uptime of a cluster) is just asking for attacks.

bq. I'm very curious precisely how you are generating these delegation tokens, 
though. I could be on a completely separate page regarding that and your 
suggestion for leader elections.

Code will speak better than I can: 
https://github.com/joshelser/accumulo/tree/delegation-tokens/server/base/src/main/java/org/apache/accumulo/server/security/delegation.
 I just finished this up, I think. Each Master and Tserver has a SecretManager 
implementation. The Master (or more generally, whoever is creating the secret 
keys), also runs the KeyManager which generates a new secret key every 
$timelength. That process also uses the KeyDistributor to add secret keys to ZK 
(for all of the "followers"). The "followers" (tservers) use the KeyWatcher to 
see changes made by the KeyDistributor and update their SecretManager.

In general, the SecretManager is a local cache off of ZooKeeper which can 
generate/verify the passwords in delegation tokens. No mechanisms yet exist to 
ensure that all followers/tservers have seen a new secret key. 


> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
> Attachments: ACCUMULO-3513-design.pdf
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-02-04 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305779#comment-14305779
 ] 

Christopher Tubbs commented on ACCUMULO-3513:
-

NOTE: DIGEST-MD5 is ill-advised, due to problems: 
http://tools.ietf.org/html/rfc6331

That's not to say that it couldn't be useful, if deployed properly. I'm just 
reluctant to rely on deprecated security modes, because it could give a false 
sense of confidence in the security being implemented.

{quote}MapReduce does not have access to Kerberos tokens. This is a 
non-starter.{quote}

The individual MapReduce nodes do not have Kerberos principals at all? How do 
they authenticate to the job controller?

{quote}... We can easily add leader election...{quote}

My point was that we don't need to do leader election. Rather, each TServer is 
just as good as any other to authenticate users, so rather than elect a single 
leader, you can simply allow any of them to issue tokens (concurrently). The 
only restriction is that to validate that token, you have to talk to the 
TServer which issued it... but that's better than always talking to a single 
leader or the master.

{quote}... This authentication model relies on the same secrets being shared 
across all nodes in the cluster. ...{quote}

If you use a single shared key, you *really* don't need leader election 
(because they all have the secret and perform the same function). However, I 
was actually thinking that each TServer would have a temporary key with which 
to generate delegation tokens. So long as that TServer hadn't crashed, it could 
validate any delegation tokens created from it.

I'm very curious precisely how you are generating these delegation tokens, 
though. I could be on a completely separate page regarding that and your 
suggestion for leader elections.

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
> Attachments: ACCUMULO-3513-design.pdf
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-01-30 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299349#comment-14299349
 ] 

Josh Elser commented on ACCUMULO-3513:
--

Thanks for taking the time to read and give feedback.

bq. Regarding DIGEST-MD5, which transport features would it support, and how do 
these relate to the auth, auth-int, and auth-conf options currently available 
with GSSAPI?

These (the quality of protection) are at the SASL level, so I believe they work 
seamlessly across the mechanism chosen.

bq. Wouldn't it be better to keep the existing GSSAPI transport, and pass the 
delegation tokens on top of that layer

MapReduce *does not* have access to Kerberos tokens. This is a non-starter.

bq. Regarding the use of ZK to propagate the rolling shared secret, we'd need 
to be careful about propagation delays using the watchers to update the cache. 
Rather than user the watchers.

That's a fair point. I'm not sure how this will look in practice (if we'll need 
to do something differently). We could back these by a table which is a 
possibility

bq. Regarding the rolling secret: this seems like it would make client tokens 
vary in their duration, and the expiration outside the control of the client 
user.

Yes, the maximum lifetime would be controlled by an Accumulo configuration 
value. This isn't too bad to expand upon once everything else is present (e.g. 
clients request shorter lifetimes).

bq. Instead of relying on the master, you could make it possible for any 
TServer to grant a delegation token. The resulting token could only be checked 
by that same TServer, but you wouldn't have to rely on a SPOF or worry about 
propagation. Clients would randomly choose a TServer to authenticate to, every 
time it needs a delegation token, and the delegation token remembers who issued 
it.

Also true, that's why I called it out. HBase just has any node in the cluster 
act as the leader, I'm not convinced that we need that level of robustness. The 
calls to get a delegation token as compared to the number of authentications (1 
client to N mappers). We can easily add leader election and re-use the same 
service I plan to make for the master by any node in the instance. This 
authentication model relies on the same secrets being shared across all nodes 
in the cluster. If I'm understanding your suggestion, each server would have 
distinct secret keys which would result in clients only being able to 
communicate to a single TabletServer (which is a non-starter).

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
> Attachments: ACCUMULO-3513-design.pdf
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-01-30 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299329#comment-14299329
 ] 

Christopher Tubbs commented on ACCUMULO-3513:
-

A few questions/comments about this plan:

# Regarding DIGEST-MD5, which transport features would it support, and how do 
these relate to the auth, auth-int, and auth-conf options currently available 
with GSSAPI?
# Wouldn't it be better to keep the existing GSSAPI transport, and pass the 
delegation tokens on top of that layer? That way, we authenticate the 
middle-man, too, and not just the end user. With the DIGEST-MD5 implementation, 
and skipping authentication for the middle-man, we cannot trust that the 
middle-man (the NodeManager?) is managing clients delegation tokens properly 
from only the RPC connection.
# Regarding the use of ZK to propagate the rolling shared secret, we'd need to 
be careful about propagation delays using the watchers to update the cache. 
Rather than user the watchers.
# Regarding the rolling secret: this seems like it would make client tokens 
vary in their duration, and the expiration outside the control of the client 
user.
# Instead of relying on the master, you could make it possible for any TServer 
to grant a delegation token. The resulting token could only be checked by that 
same TServer, but you wouldn't have to rely on a SPOF or worry about 
propagation. Clients would randomly choose a TServer to authenticate to, every 
time it needs a delegation token, and the delegation token remembers who issued 
it.

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
> Attachments: ACCUMULO-3513-design.pdf
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-01-27 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294730#comment-14294730
 ] 

Christopher Tubbs commented on ACCUMULO-3513:
-

bq. I'm not sure how we can make any reliable security model if we operate 
under the assumption that YARN is insecure. We have to trust that the YARN task 
was correctly authenticated.

Right, we have to authenticate both YARN *and* the end user. Even if YARN 
doesn't work this way, and it uses some delegation token instead of any 
identifying information about itself, Accumulo's implementation requires a 
Kerberos token at the transport layer. You can't just omit a Kerberos token and 
replace it with a delegation token in Accumulo's implementation (nor do I think 
it'd be a good idea to try, because I do think we need to authenticate the 
middle-man, in this case YARN).

bq. Again. We have to assume YARN is doing the right thing.

No, we absolutely do not have to make any such assumption. We can validate that 
by only whitelisting approved, trusted intermediaries. This is no different 
than X.509 extensions that designate permitted uses on certificates. The fact 
that a certificate was signed by the same CA, does not automatically make it 
appropriate to use to sign executable code, or to encrypt email. The only thing 
is, Kerberos does not have any such mechanism built-in, like X.509 certificate 
extensions, so whitelist is the only option.

bq. The code running inside a YARN task is untrusted (unless you restrict job 
submission and vet the users externally – hit the users with a stick and tell 
them to behave). We should not be trusting this code to act as the user that it 
should.

That's just my point... you don't know what is going on inside the YARN system. 
For all you know, there is a job accessing the local disk or system memory, 
searching for other client's credentials, and using them to connect to 
Accumulo. Just because YARN tries to connect using some client's credentials, 
it doesn't mean it's a valid use (granted, that takes effort). You've got to 
actually lock down your YARN instance vet the infrastructure and the code it 
runs before you can be sure that the credentials a job in YARN uses to try to 
connect to Accumulo with are for a legitimate purpose. But, once this is done, 
the precise degree to which the additional security offered by the delegation 
token (due to expirable attributes, for instance) is debatable... but I concede 
that it is at least marginally better than without, so we can move past that 
point if you like. If it has the ability to expire, I'm in favor.

bq. The shared secret is acting in place of the kerberos credentials because 
there is no credentials available for use. ...

I'm not so sure that's true. There's no credentials that represent the end 
user, which are available to use, but the YARN process itself should have some 
Kerberos identity, shouldn't it? I've read that paper, but and the quoted 
portion, but I had assumed (perhaps incorrectly) that the YARN process would 
use its own Kerberos credentials to set up the transport layer, over which it 
sends the delegation token for additional validation and authorization. I 
assumed the wording about it using a delegation token in place of a Kerberos 
token was just shorthand for something a bit more complicated. Otherwise, what 
network protocol is it using that supports both Kerberos and a delegation 
token? Even if HDFS/YARN is using some custom protocol which supports both (or 
two RPC endpoints), Accumulo's SASL implementation certainly is not... it needs 
*some* Kerberos credentials to set up the transport layer, before we can send 
any delegation token or whatever across.

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to n

[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-01-27 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294680#comment-14294680
 ] 

Josh Elser commented on ACCUMULO-3513:
--

bq. Well, no... we don't know that it does this already. We have no idea how it 
may have been compromised internally

I'm not sure how we can make any reliable security model if we operate under 
the assumption that YARN is insecure. We have to trust that the YARN task was 
correctly authenticated.

bq. Accumulo and the real client is trustworthy and is handling the client's 
credentials properly

Again. We have to assume YARN is doing the right thing.

bq.  it's not much of a stretch to just trust that it is acting on behalf of 
user X, simply because it says so

That's the point I'm trying to make. That trust is a *huge* stretch. The code 
running inside a YARN task is untrusted (unless you restrict job submission and 
vet the users externally -- hit the users with a stick and tell them to 
behave). We should not be trusting this code to act as the user that it should.

bq. The extra, expirable, shared secret is nice, but it doesn't get is much 
further than what we can do without it, in my opinion

The shared secret is acting in place of the kerberos credentials because there 
is no credentials available for use. It's not optional -- it's what acts as the 
authentication (password over SASL instead of the kerberos identity). This is 
the best snippet I've read that describes things:

{quote}
Kerberos is a 3-party protocol that solves the hard
problem of setting up an authenticated connection
between a client and a server that have never com-
municated with each other before (but they both reg-
istered with Kerberos KDC). Our delegation token is
also used to set up an authenticated connection be-
tween a client and a server (NameNode in this case).
The difference is that we assume the client and the
server had previously shared a secure connection (via
Kerberos), over which a delegation token can be ex-
changed. Hence, delegation token is essentially a
2-party protocol and much simpler than Kerberos.
However, we use Kerberos to bootstrap the initial
trust between a client and NameNode in order to ex-
change the delegation token for later use to set up
another secure connection between the client (actu-
ally job tasks launched on behalf of the client) and
the same NameNode
{quote}

Please take some time to read [this overview on Hadoop 
security|http://hortonworks.com/wp-content/uploads/2011/10/security-design_withCover-1.pdf].
 It covers these points.

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-01-27 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294675#comment-14294675
 ] 

Josh Elser commented on ACCUMULO-3513:
--

Uhh, you may want to look at how YARN works, because that is not it :). YARN 
tasks do *not* run as the "yarn" user. Therefore, they do not have access to 
the nodemanager's kerberos credentials.

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-01-27 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294673#comment-14294673
 ] 

Christopher Tubbs commented on ACCUMULO-3513:
-

No, the yarn user still has to use its own Kerberos credentials to set up the 
transport layer with Accumulo. It may be acting on behalf of a user, but it 
still needs to authenticate to Accumulo as itself first. See below.

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-01-27 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294670#comment-14294670
 ] 

Christopher Tubbs commented on ACCUMULO-3513:
-

bq. Yes, we still need to trust that MapReduce is keeping the shared secret 
safe which we know it does already.

Well, no... we don't know that it does this already. We have no idea how it may 
have been compromised internally. All we know is that somehow, it gained access 
to a client's pre-negotiated shared secret. We hope it did this by strongly 
authenticating with that client and that client voluntarily giving it the 
shared secret, and that it was kept safe internally the entire time, but we 
don't know that it did. We trust that it does this because we check (or should 
check) its Kerberos credentials at the transport layer.

bq. The ability to expire a shared secret gives us some more confident that the 
shared secret won't be reused by some unwanted party.

I agree, but we still need to ensure that the layer in between Accumulo and the 
real client is trustworthy and is handling the client's credentials properly. 
My only point was that if we already trust that layer to do that (which we 
definitely need to do... and not just any Kerberos principal can be trusted), 
it's not much of a stretch to just trust that it is *acting on behalf of user 
X, simply because it says so*. The extra, expirable, shared secret is nice, but 
it doesn't get is *much* further than what we can do without it, in my opinion. 
An expirable characteristic is a benefit (if it wasn't expirable, it wouldn't 
have any value at all). Other characteristics, like having attributes which 
include specific authorizations that shared secret is allowed to be used for, 
is even better (eg. instead of "you're allowed to act as me", you get "you're 
allowed to act as me to query this table").

bq. We don't need a whitelist mechanism unless you're not trusting YARN itself 
which doesn't make any sense to me (which I think you already agree on)

No, that's precisely the layer I don't trust without a whitelist. It still 
needs to authenticate with Kerberos... the transport layer requires it... and 
not *all* Kerberos principals should be allowed to freely use some other user's 
delegation token, just because they *somehow* got ahold of one.

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-01-27 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294629#comment-14294629
 ] 

Josh Elser commented on ACCUMULO-3513:
--

bq. Without the whitelist, and only the delegation token, all we can do is 
trust that the MapReduce layer authenticated the client at some point, for some 
purpose. With the whitelist, we can trust that we've vetted the MapReduce layer 
to function properly. If we already have that degree of trust, the delegation 
token is kinda moot.

I'm not sure you understand how the delegation token would work. The client 
would need to communicate with an Accumulo process to obtain some shared secret 
between Accumulo and that client. So, in addition to knowing that YARN is 
vetting that the "real" user is running the tasks on YARN, we know that the 
"real" user is going to be communicating with us using the shared secret we 
agreed upon.

When YARN actually runs the tasks for us, as that unix user acct tied to the 
client, that yarn task will have the shared secret (that we trust YARN to keep 
safe when it leaves the client's possession and go into the cluster), we let 
Accumulo RPCs happen using the shared secret instead of the KRB credentials. 
The YARN task isn't connecting to Accumulo with it's principals because, again, 
it's not running as a {{yarn}} user, but the "real" user".

So, no. I say again that the delegation token is not moot :)

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-01-27 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294623#comment-14294623
 ] 

Josh Elser commented on ACCUMULO-3513:
--

bq. At some point, I think that's the best we can get. We cannot get direct 
access to the client's credentials, so we must trust another party (in this 
case, the MapReduce servers).

Right, I agree with you here, of course, but we still need some way to control 
when non-strongly-authenticated users (w/o kerberos credentials) try to connect 
to Accumulo. That's the crux of what we need to solve to make MapReduce 
actually work.

bq. We could require that the clients authenticate to Accumulo to generate a 
shared secret (really, though, they just need to authenticate to the 
Authenticator implementation backing Accumulo). This is analogous to the HDFS 
delegation token. The client can then give this shared secret to the MapReduce 
layer to use when talking to Accumulo, to ensure that the client did actually 
hand that secret to the MapReduce layer, requesting it to do work on its behalf

This is, like you say, ultimately what the delegation token boils down to and 
what I plan to do. Yes, we need to trust the ResourceManager to disallow users 
who have no credentials, but we still should have some shared secret support (a 
special token or data inside of a token) to prevent the need for additional 
configuration to just run MapReduce with Hadoop security on.

bq. However, we still need to designate the MapReduce layer as trustable in 
some way... because this layer could reuse one client's credentials to perform 
an unauthorized task and give the results to a different user

Yes, we still need to trust that MapReduce is keeping the shared secret safe 
which we know it does already. The ability to expire a shared secret gives us 
_some more_ confident that the shared secret won't be reused by some unwanted 
party. The yarn tasks themselves are run as the submitting user, so all we are 
relying on YARN to do is to set up a proper environment running as the client 
(to be clear, the actual unix user).

bq. The whitelist mechanism gives us some assurance that we've vetted that 
layer to not do those sorts of things.

We don't need a whitelist mechanism unless you're not trusting YARN itself 
which doesn't make any sense to me (which I think you already agree on)

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-01-27 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294573#comment-14294573
 ] 

Christopher Tubbs commented on ACCUMULO-3513:
-

Without the whitelist, and only the delegation token, all we can do is trust 
that the MapReduce layer authenticated the client at some point, for some 
purpose. With the whitelist, we can trust that we've vetted the MapReduce layer 
to function properly. If we already have that degree of trust, the delegation 
token is kinda moot.

That is, unless the delegation token includes information about *specifically* 
which functions are authorized by a client. But, that's a *lot* more complex 
than just authentication... because it encroaches upon authorization.

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-01-27 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294563#comment-14294563
 ] 

Christopher Tubbs commented on ACCUMULO-3513:
-

>From Accumulo's perspective, it would just mean that we "trust" the MapReduce 
>layer to do the check... whether that means we choose to lock down access to 
>the MapReduce layer, or whatever mechanism involved in the MapReduce layer to 
>authenticate clients is properly propagated to Accumulo.

It doesn't prevent unwanted impersonation... it simply assigns trust to the 
MapReduce system to do that. At some point, I think that's the best we can get. 
We cannot get direct access to the client's credentials, so we must trust 
another party (in this case, the MapReduce servers).

We could require that the clients authenticate to Accumulo to generate a shared 
secret (really, though, they just need to authenticate to the Authenticator 
implementation backing Accumulo). This is analogous to the HDFS delegation 
token. The client can then give this shared secret to the MapReduce layer to 
use when talking to Accumulo, to ensure that the client did actually hand that 
secret to the MapReduce layer, requesting it to do work on its behalf. However, 
we still need to designate the MapReduce layer as trustable in some way... 
because this layer could reuse one client's credentials to perform an 
unauthorized task and give the results to a different user. The whitelist 
mechanism gives us some assurance that we've vetted that layer to not do those 
sorts of things.

If we already have to designate trust to that intermediate layer, I don't see a 
lot of added value with the complexity of the delegation token mechanism to 
prove that it is, in fact, doing work on behalf of a particular client.

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-01-27 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294382#comment-14294382
 ] 

Josh Elser commented on ACCUMULO-3513:
--

I'm still unclear of how you think this prevents unwanted impersonation from 
happening. For mapreduce, the only time that we "know" who a client is happens 
when they submit the job. We need to tie the fact that the client is who they 
say they are (from their kerberos credentials) and construct a way to let node 
managers who no longer have any idea what the job-submitter's credentials are 
(this is the notion of the delegation token from HDFS and others).

In your example, we would have to trust that each and every mapreduce job in 
the system is going to "do the right thing" and not impersonate users they 
shouldn't which isn't sufficient for a solution. We can do much better by 
taking the delegation token approach.

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-01-27 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294368#comment-14294368
 ] 

Christopher Tubbs commented on ACCUMULO-3513:
-

(Update: basically, I'm suggesting a whitelist for allowed delegators, which 
would basically include all task-trackers principals)

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-01-27 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294327#comment-14294327
 ] 

Josh Elser commented on ACCUMULO-3513:
--

For later if needed: the actual failure seen if you try to run a MR job now.

{{noformat}}
Error: java.io.IOException: java.lang.IllegalArgumentException: Cannot 
instantiate org.apache.accumulo.core.client.security.tokens.KerberosToken
at 
org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat.getRecordWriter(AccumuloOutputFormat.java:559)
at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.IllegalArgumentException: Cannot instantiate 
org.apache.accumulo.core.client.security.tokens.KerberosToken
at 
org.apache.accumulo.core.client.security.tokens.AuthenticationToken$AuthenticationTokenSerializer.deserialize(AuthenticationToken.java:65)
at 
org.apache.accumulo.core.client.security.tokens.AuthenticationToken$AuthenticationTokenSerializer.deserialize(AuthenticationToken.java:98)
at 
org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase.getAuthenticationToken(ConfiguratorBase.java:229)
at 
org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat.getAuthenticationToken(AccumuloOutputFormat.java:172)
at 
org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat$AccumuloRecordWriter.(AccumuloOutputFormat.java:403)
at 
org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat.getRecordWriter(AccumuloOutputFormat.java:557)
... 8 more
Caused by: java.lang.IllegalArgumentException: Subject is not logged in via 
Kerberos
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at 
org.apache.accumulo.core.client.security.tokens.KerberosToken.(KerberosToken.java:53)
at 
org.apache.accumulo.core.client.security.tokens.KerberosToken.(KerberosToken.java:65)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at java.lang.Class.newInstance(Class.java:379)
at 
org.apache.accumulo.core.client.security.tokens.AuthenticationToken$AuthenticationTokenSerializer.deserialize(AuthenticationToken.java:63)
... 13 more
{{noformat}}

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-01-23 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289578#comment-14289578
 ] 

Christopher Tubbs commented on ACCUMULO-3513:
-

A possible solution: extend KerberosToken to have an "isDelegateFor" concept, 
and if it is constructed as a delegate, then we can use transport's principal 
to check to see if it is allowed to delegate, then we can use the delegated 
principal to do any other permissions checks.

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled

2015-01-22 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288302#comment-14288302
 ] 

Josh Elser commented on ACCUMULO-3513:
--

Marked as a blocker because there is no "good" way to currently run a mapreduce 
job when Accumulo is using SASL servers. Need to address this in some fashion 
for 1.7.0, or, at the absolute minimum, manage user expectations.

> Ensure MapReduce functionality with Kerberos enabled
> 
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 1.7.0
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)