[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306160#comment-14306160 ] Josh Elser commented on ACCUMULO-3513: -- bq. Ugh, I missed that. Sorry. Now you need to grant YARN setuid privileges. That's... unfortunate. I suppose you also have to make assumptions about which UID you need to use, based on the content of the delegation token, too, and I guess there's no guarantee that this will even be the same on every node, or match the submitter's UID. (Though, presumably, they will all be the same if using some common login service, like AD on all the nodes.) Yes, it is a pain to get YARN set up in secure mode (notably setuid stuff), but it is well written out what you need to do. It's also a stated YARN assumption that the user must exist on every node. > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > Attachments: ACCUMULO-3513-design.pdf > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306151#comment-14306151 ] Christopher Tubbs commented on ACCUMULO-3513: - bq. the YARN tasks run as the user who submitted the job Ugh, I missed that. Sorry. Now you need to grant YARN setuid privileges. That's... unfortunate. I suppose you also have to make assumptions about which UID you need to use, based on the content of the delegation token, too, and I guess there's no guarantee that this will even be the same on every node, or match the submitter's UID. (Though, presumably, they will all be the same if using some common login service, like AD on all the nodes.) bq. Why does the resource manager need to authenticate with Accumulo? It doesn't *need* to. It'd just be a good idea if it did. We have no way to trust (vet/accredit/account for/log) the YARN layer. We don't know that it's actually YARN it could be some rogue process that hasn't been vetted. We lose the ability to mutually authenticate the service we are handing data to. It'd be really great if we didn't have to give that up. Granted, with regular passwords, we cannot do this either, but at least that security model and its risks are well-understood. We can try to think of something which would make this more secure than that. bq. I'm not sure I understand what you mean here: No user code is being run with YARN's credentials. Yes, I know this is how it works. I'm simply describing the competing goal. YARN is implemented this way to make it impossible for tasks to use the node's own credentials, but that's precisely what would be useful for Accumulo so it knew that the requester was the trusted YARN layer. > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > Attachments: ACCUMULO-3513-design.pdf > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306073#comment-14306073 ] Josh Elser commented on ACCUMULO-3513: -- bq. What user does the task run as? If the effective UID is the same as its parent, the filesystem won't protect it. Pretty sure I covered this already: the YARN tasks run as the user who submitted the job. This requires that your user exists across your YARN node managers. Thus, it is not the same effective UID, it's an entirely different one. bq. If only the ResourceManager and the client could authenticate with Accumulo first Why does the resource manager need to authenticate with Accumulo? The user needs to trust that the YARN cluster they're talking to is "real" (and not some third party that is somehow masquerading as a YARN cluster). If a user is just submitting their credentials to anyone who listens, the problem is with that user and not something we can solve with Accumulo. bq. MapReduce needs to avoid granting access to its credentials from an untrusted client (which Accumulo does trust) I'm not sure I understand what you mean here: No user code is being run with YARN's credentials. YARN tasks could be run by users who don't have Accumulo "accounts", but just being able to run a YARN job, doesn't mean they can authenticate with Accumulo (with a delegation token that was obtained with real credentials). > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > Attachments: ACCUMULO-3513-design.pdf > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306061#comment-14306061 ] Christopher Tubbs commented on ACCUMULO-3513: - {quote}Keytabs on disk should be protected by the filesystem. ... A little C program ... drops permissions ...{quote} What user does the task run as? If the effective UID is the same as its parent, the filesystem won't protect it. {quote}... it's expected that the delegation token is protected from prying eyes ...{quote} There seems to be a trade-off here, with competing goals. On the one hand, we need to make sure Accumulo doesn't give up data to an untrusted middle-man. And, on the other hand, MapReduce needs to avoid granting access to its credentials from an untrusted client (which Accumulo *does* trust). If only the ResourceManager *and* the client could authenticate with Accumulo first, then we could carry information about both of these things in the token used to authenticate to Accumulo in the actual task, then we could trust the middle-man (YARN task) *and* the client to be able to receive the data from Accumulo. > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > Attachments: ACCUMULO-3513-design.pdf > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306004#comment-14306004 ] Josh Elser commented on ACCUMULO-3513: -- bq. Oh. Interesting. So, the YARN process can securely authenticate itself with the job controller (NodeManager? I'm not sure terminology here) before a job is submitted, but the task doesn't have access to that. ResourceManager, but yes, I think you have the point. bq. How do they prevent the tasks from getting access to the parent process' Kerberos keytab? So, it's an entirely new process, so there's no shared memory. Keytabs on disk should be protected by the filesystem. bq. How are these tasks sandboxed? A little C program is executed by the nodemanager which does your normal fork(), drops permissions on the child process, and runs the actual yarn task. bq. Could our Input/OutputFormat be configured to access this keytab? No, for the above reason -- we cannot read it. If it was generally open, anyone could impersonate the yarn processes. bq. I guess you might not want to do that if you don't trust the job which was submitted, but I'm not sure how we (Accumulo services) can trust that the request is coming from a trusted YARN service, and not some other party which maliciously gained access to a client's delegation token. Like any password, it's expected that the delegation token is protected from prying eyes. The time-limit on the validity of the delegation token helps mitigate some concern, but that's a very small mitigation. We ultimately need to rely on YARN (which it is doing) to keep the delegation token safe from prying eyes from when it leaves the client's possession and makes it way to the actual yarn task. > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > Attachments: ACCUMULO-3513-design.pdf > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305989#comment-14305989 ] Christopher Tubbs commented on ACCUMULO-3513: - {quote}YARN processes have kerberos principals and credentials, but the tasks they spawn do not.{quote} Oh. Interesting. So, the YARN process can securely authenticate itself with the job controller (NodeManager? I'm not sure terminology here) before a job is submitted, but the task doesn't have access to that. How do they prevent the tasks from getting access to the parent process' Kerberos keytab? How are these tasks sandboxed? Could our Input/OutputFormat be configured to access this keytab? I guess you might not want to do that if you don't trust the job which was submitted, but I'm not sure how we (Accumulo services) can trust that the request is coming from a trusted YARN service, and not some other party which maliciously gained access to a client's delegation token. {quote}This would require us have clients hold onto N delegation tokens though.{quote} No, there'd still only be one delegation token in play, but whoever generated it might change. I'm suggesting instead of a global, fixed "leader" involving coordination, a random "leader" is selected for each delegation token. {quote}You need the coordination to roll new secret keys. Using the same secret key for months (assuming average uptime of a cluster) is just asking for attacks.{quote} That's not what I was suggesting. I was suggesting eliminating the need to coordinate between servers by making one server responsible for each token (corresponding to a temporary key stored within that tserver). {quote}Code will speak better than I can:...{quote} Cool. Will take a look. > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > Attachments: ACCUMULO-3513-design.pdf > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305797#comment-14305797 ] Josh Elser commented on ACCUMULO-3513: -- I haven't really read up about DIGEST-MD5. I'll have to look into that and see if there's anything better we can use with SASL. bq The individual MapReduce nodes do not have Kerberos principals at all? How do they authenticate to the job controller? Delegation tokens. bq. you have to talk to the TServer which issued it This would require us have clients hold onto N delegation tokens though. That'd make the client implementation much more difficult than a singular delegation token that any node in the instance can verify. bq. If you use a single shared key, you really don't need leader election (because they all have the secret and perform the same function) You need the coordination to roll new secret keys. Using the same secret key for months (assuming average uptime of a cluster) is just asking for attacks. bq. I'm very curious precisely how you are generating these delegation tokens, though. I could be on a completely separate page regarding that and your suggestion for leader elections. Code will speak better than I can: https://github.com/joshelser/accumulo/tree/delegation-tokens/server/base/src/main/java/org/apache/accumulo/server/security/delegation. I just finished this up, I think. Each Master and Tserver has a SecretManager implementation. The Master (or more generally, whoever is creating the secret keys), also runs the KeyManager which generates a new secret key every $timelength. That process also uses the KeyDistributor to add secret keys to ZK (for all of the "followers"). The "followers" (tservers) use the KeyWatcher to see changes made by the KeyDistributor and update their SecretManager. In general, the SecretManager is a local cache off of ZooKeeper which can generate/verify the passwords in delegation tokens. No mechanisms yet exist to ensure that all followers/tservers have seen a new secret key. > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > Attachments: ACCUMULO-3513-design.pdf > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305779#comment-14305779 ] Christopher Tubbs commented on ACCUMULO-3513: - NOTE: DIGEST-MD5 is ill-advised, due to problems: http://tools.ietf.org/html/rfc6331 That's not to say that it couldn't be useful, if deployed properly. I'm just reluctant to rely on deprecated security modes, because it could give a false sense of confidence in the security being implemented. {quote}MapReduce does not have access to Kerberos tokens. This is a non-starter.{quote} The individual MapReduce nodes do not have Kerberos principals at all? How do they authenticate to the job controller? {quote}... We can easily add leader election...{quote} My point was that we don't need to do leader election. Rather, each TServer is just as good as any other to authenticate users, so rather than elect a single leader, you can simply allow any of them to issue tokens (concurrently). The only restriction is that to validate that token, you have to talk to the TServer which issued it... but that's better than always talking to a single leader or the master. {quote}... This authentication model relies on the same secrets being shared across all nodes in the cluster. ...{quote} If you use a single shared key, you *really* don't need leader election (because they all have the secret and perform the same function). However, I was actually thinking that each TServer would have a temporary key with which to generate delegation tokens. So long as that TServer hadn't crashed, it could validate any delegation tokens created from it. I'm very curious precisely how you are generating these delegation tokens, though. I could be on a completely separate page regarding that and your suggestion for leader elections. > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > Attachments: ACCUMULO-3513-design.pdf > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299349#comment-14299349 ] Josh Elser commented on ACCUMULO-3513: -- Thanks for taking the time to read and give feedback. bq. Regarding DIGEST-MD5, which transport features would it support, and how do these relate to the auth, auth-int, and auth-conf options currently available with GSSAPI? These (the quality of protection) are at the SASL level, so I believe they work seamlessly across the mechanism chosen. bq. Wouldn't it be better to keep the existing GSSAPI transport, and pass the delegation tokens on top of that layer MapReduce *does not* have access to Kerberos tokens. This is a non-starter. bq. Regarding the use of ZK to propagate the rolling shared secret, we'd need to be careful about propagation delays using the watchers to update the cache. Rather than user the watchers. That's a fair point. I'm not sure how this will look in practice (if we'll need to do something differently). We could back these by a table which is a possibility bq. Regarding the rolling secret: this seems like it would make client tokens vary in their duration, and the expiration outside the control of the client user. Yes, the maximum lifetime would be controlled by an Accumulo configuration value. This isn't too bad to expand upon once everything else is present (e.g. clients request shorter lifetimes). bq. Instead of relying on the master, you could make it possible for any TServer to grant a delegation token. The resulting token could only be checked by that same TServer, but you wouldn't have to rely on a SPOF or worry about propagation. Clients would randomly choose a TServer to authenticate to, every time it needs a delegation token, and the delegation token remembers who issued it. Also true, that's why I called it out. HBase just has any node in the cluster act as the leader, I'm not convinced that we need that level of robustness. The calls to get a delegation token as compared to the number of authentications (1 client to N mappers). We can easily add leader election and re-use the same service I plan to make for the master by any node in the instance. This authentication model relies on the same secrets being shared across all nodes in the cluster. If I'm understanding your suggestion, each server would have distinct secret keys which would result in clients only being able to communicate to a single TabletServer (which is a non-starter). > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > Attachments: ACCUMULO-3513-design.pdf > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299329#comment-14299329 ] Christopher Tubbs commented on ACCUMULO-3513: - A few questions/comments about this plan: # Regarding DIGEST-MD5, which transport features would it support, and how do these relate to the auth, auth-int, and auth-conf options currently available with GSSAPI? # Wouldn't it be better to keep the existing GSSAPI transport, and pass the delegation tokens on top of that layer? That way, we authenticate the middle-man, too, and not just the end user. With the DIGEST-MD5 implementation, and skipping authentication for the middle-man, we cannot trust that the middle-man (the NodeManager?) is managing clients delegation tokens properly from only the RPC connection. # Regarding the use of ZK to propagate the rolling shared secret, we'd need to be careful about propagation delays using the watchers to update the cache. Rather than user the watchers. # Regarding the rolling secret: this seems like it would make client tokens vary in their duration, and the expiration outside the control of the client user. # Instead of relying on the master, you could make it possible for any TServer to grant a delegation token. The resulting token could only be checked by that same TServer, but you wouldn't have to rely on a SPOF or worry about propagation. Clients would randomly choose a TServer to authenticate to, every time it needs a delegation token, and the delegation token remembers who issued it. > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > Attachments: ACCUMULO-3513-design.pdf > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294730#comment-14294730 ] Christopher Tubbs commented on ACCUMULO-3513: - bq. I'm not sure how we can make any reliable security model if we operate under the assumption that YARN is insecure. We have to trust that the YARN task was correctly authenticated. Right, we have to authenticate both YARN *and* the end user. Even if YARN doesn't work this way, and it uses some delegation token instead of any identifying information about itself, Accumulo's implementation requires a Kerberos token at the transport layer. You can't just omit a Kerberos token and replace it with a delegation token in Accumulo's implementation (nor do I think it'd be a good idea to try, because I do think we need to authenticate the middle-man, in this case YARN). bq. Again. We have to assume YARN is doing the right thing. No, we absolutely do not have to make any such assumption. We can validate that by only whitelisting approved, trusted intermediaries. This is no different than X.509 extensions that designate permitted uses on certificates. The fact that a certificate was signed by the same CA, does not automatically make it appropriate to use to sign executable code, or to encrypt email. The only thing is, Kerberos does not have any such mechanism built-in, like X.509 certificate extensions, so whitelist is the only option. bq. The code running inside a YARN task is untrusted (unless you restrict job submission and vet the users externally – hit the users with a stick and tell them to behave). We should not be trusting this code to act as the user that it should. That's just my point... you don't know what is going on inside the YARN system. For all you know, there is a job accessing the local disk or system memory, searching for other client's credentials, and using them to connect to Accumulo. Just because YARN tries to connect using some client's credentials, it doesn't mean it's a valid use (granted, that takes effort). You've got to actually lock down your YARN instance vet the infrastructure and the code it runs before you can be sure that the credentials a job in YARN uses to try to connect to Accumulo with are for a legitimate purpose. But, once this is done, the precise degree to which the additional security offered by the delegation token (due to expirable attributes, for instance) is debatable... but I concede that it is at least marginally better than without, so we can move past that point if you like. If it has the ability to expire, I'm in favor. bq. The shared secret is acting in place of the kerberos credentials because there is no credentials available for use. ... I'm not so sure that's true. There's no credentials that represent the end user, which are available to use, but the YARN process itself should have some Kerberos identity, shouldn't it? I've read that paper, but and the quoted portion, but I had assumed (perhaps incorrectly) that the YARN process would use its own Kerberos credentials to set up the transport layer, over which it sends the delegation token for additional validation and authorization. I assumed the wording about it using a delegation token in place of a Kerberos token was just shorthand for something a bit more complicated. Otherwise, what network protocol is it using that supports both Kerberos and a delegation token? Even if HDFS/YARN is using some custom protocol which supports both (or two RPC endpoints), Accumulo's SASL implementation certainly is not... it needs *some* Kerberos credentials to set up the transport layer, before we can send any delegation token or whatever across. > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to n
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294680#comment-14294680 ] Josh Elser commented on ACCUMULO-3513: -- bq. Well, no... we don't know that it does this already. We have no idea how it may have been compromised internally I'm not sure how we can make any reliable security model if we operate under the assumption that YARN is insecure. We have to trust that the YARN task was correctly authenticated. bq. Accumulo and the real client is trustworthy and is handling the client's credentials properly Again. We have to assume YARN is doing the right thing. bq. it's not much of a stretch to just trust that it is acting on behalf of user X, simply because it says so That's the point I'm trying to make. That trust is a *huge* stretch. The code running inside a YARN task is untrusted (unless you restrict job submission and vet the users externally -- hit the users with a stick and tell them to behave). We should not be trusting this code to act as the user that it should. bq. The extra, expirable, shared secret is nice, but it doesn't get is much further than what we can do without it, in my opinion The shared secret is acting in place of the kerberos credentials because there is no credentials available for use. It's not optional -- it's what acts as the authentication (password over SASL instead of the kerberos identity). This is the best snippet I've read that describes things: {quote} Kerberos is a 3-party protocol that solves the hard problem of setting up an authenticated connection between a client and a server that have never com- municated with each other before (but they both reg- istered with Kerberos KDC). Our delegation token is also used to set up an authenticated connection be- tween a client and a server (NameNode in this case). The difference is that we assume the client and the server had previously shared a secure connection (via Kerberos), over which a delegation token can be ex- changed. Hence, delegation token is essentially a 2-party protocol and much simpler than Kerberos. However, we use Kerberos to bootstrap the initial trust between a client and NameNode in order to ex- change the delegation token for later use to set up another secure connection between the client (actu- ally job tasks launched on behalf of the client) and the same NameNode {quote} Please take some time to read [this overview on Hadoop security|http://hortonworks.com/wp-content/uploads/2011/10/security-design_withCover-1.pdf]. It covers these points. > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294675#comment-14294675 ] Josh Elser commented on ACCUMULO-3513: -- Uhh, you may want to look at how YARN works, because that is not it :). YARN tasks do *not* run as the "yarn" user. Therefore, they do not have access to the nodemanager's kerberos credentials. > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294673#comment-14294673 ] Christopher Tubbs commented on ACCUMULO-3513: - No, the yarn user still has to use its own Kerberos credentials to set up the transport layer with Accumulo. It may be acting on behalf of a user, but it still needs to authenticate to Accumulo as itself first. See below. > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294670#comment-14294670 ] Christopher Tubbs commented on ACCUMULO-3513: - bq. Yes, we still need to trust that MapReduce is keeping the shared secret safe which we know it does already. Well, no... we don't know that it does this already. We have no idea how it may have been compromised internally. All we know is that somehow, it gained access to a client's pre-negotiated shared secret. We hope it did this by strongly authenticating with that client and that client voluntarily giving it the shared secret, and that it was kept safe internally the entire time, but we don't know that it did. We trust that it does this because we check (or should check) its Kerberos credentials at the transport layer. bq. The ability to expire a shared secret gives us some more confident that the shared secret won't be reused by some unwanted party. I agree, but we still need to ensure that the layer in between Accumulo and the real client is trustworthy and is handling the client's credentials properly. My only point was that if we already trust that layer to do that (which we definitely need to do... and not just any Kerberos principal can be trusted), it's not much of a stretch to just trust that it is *acting on behalf of user X, simply because it says so*. The extra, expirable, shared secret is nice, but it doesn't get is *much* further than what we can do without it, in my opinion. An expirable characteristic is a benefit (if it wasn't expirable, it wouldn't have any value at all). Other characteristics, like having attributes which include specific authorizations that shared secret is allowed to be used for, is even better (eg. instead of "you're allowed to act as me", you get "you're allowed to act as me to query this table"). bq. We don't need a whitelist mechanism unless you're not trusting YARN itself which doesn't make any sense to me (which I think you already agree on) No, that's precisely the layer I don't trust without a whitelist. It still needs to authenticate with Kerberos... the transport layer requires it... and not *all* Kerberos principals should be allowed to freely use some other user's delegation token, just because they *somehow* got ahold of one. > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294629#comment-14294629 ] Josh Elser commented on ACCUMULO-3513: -- bq. Without the whitelist, and only the delegation token, all we can do is trust that the MapReduce layer authenticated the client at some point, for some purpose. With the whitelist, we can trust that we've vetted the MapReduce layer to function properly. If we already have that degree of trust, the delegation token is kinda moot. I'm not sure you understand how the delegation token would work. The client would need to communicate with an Accumulo process to obtain some shared secret between Accumulo and that client. So, in addition to knowing that YARN is vetting that the "real" user is running the tasks on YARN, we know that the "real" user is going to be communicating with us using the shared secret we agreed upon. When YARN actually runs the tasks for us, as that unix user acct tied to the client, that yarn task will have the shared secret (that we trust YARN to keep safe when it leaves the client's possession and go into the cluster), we let Accumulo RPCs happen using the shared secret instead of the KRB credentials. The YARN task isn't connecting to Accumulo with it's principals because, again, it's not running as a {{yarn}} user, but the "real" user". So, no. I say again that the delegation token is not moot :) > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294623#comment-14294623 ] Josh Elser commented on ACCUMULO-3513: -- bq. At some point, I think that's the best we can get. We cannot get direct access to the client's credentials, so we must trust another party (in this case, the MapReduce servers). Right, I agree with you here, of course, but we still need some way to control when non-strongly-authenticated users (w/o kerberos credentials) try to connect to Accumulo. That's the crux of what we need to solve to make MapReduce actually work. bq. We could require that the clients authenticate to Accumulo to generate a shared secret (really, though, they just need to authenticate to the Authenticator implementation backing Accumulo). This is analogous to the HDFS delegation token. The client can then give this shared secret to the MapReduce layer to use when talking to Accumulo, to ensure that the client did actually hand that secret to the MapReduce layer, requesting it to do work on its behalf This is, like you say, ultimately what the delegation token boils down to and what I plan to do. Yes, we need to trust the ResourceManager to disallow users who have no credentials, but we still should have some shared secret support (a special token or data inside of a token) to prevent the need for additional configuration to just run MapReduce with Hadoop security on. bq. However, we still need to designate the MapReduce layer as trustable in some way... because this layer could reuse one client's credentials to perform an unauthorized task and give the results to a different user Yes, we still need to trust that MapReduce is keeping the shared secret safe which we know it does already. The ability to expire a shared secret gives us _some more_ confident that the shared secret won't be reused by some unwanted party. The yarn tasks themselves are run as the submitting user, so all we are relying on YARN to do is to set up a proper environment running as the client (to be clear, the actual unix user). bq. The whitelist mechanism gives us some assurance that we've vetted that layer to not do those sorts of things. We don't need a whitelist mechanism unless you're not trusting YARN itself which doesn't make any sense to me (which I think you already agree on) > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294573#comment-14294573 ] Christopher Tubbs commented on ACCUMULO-3513: - Without the whitelist, and only the delegation token, all we can do is trust that the MapReduce layer authenticated the client at some point, for some purpose. With the whitelist, we can trust that we've vetted the MapReduce layer to function properly. If we already have that degree of trust, the delegation token is kinda moot. That is, unless the delegation token includes information about *specifically* which functions are authorized by a client. But, that's a *lot* more complex than just authentication... because it encroaches upon authorization. > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294563#comment-14294563 ] Christopher Tubbs commented on ACCUMULO-3513: - >From Accumulo's perspective, it would just mean that we "trust" the MapReduce >layer to do the check... whether that means we choose to lock down access to >the MapReduce layer, or whatever mechanism involved in the MapReduce layer to >authenticate clients is properly propagated to Accumulo. It doesn't prevent unwanted impersonation... it simply assigns trust to the MapReduce system to do that. At some point, I think that's the best we can get. We cannot get direct access to the client's credentials, so we must trust another party (in this case, the MapReduce servers). We could require that the clients authenticate to Accumulo to generate a shared secret (really, though, they just need to authenticate to the Authenticator implementation backing Accumulo). This is analogous to the HDFS delegation token. The client can then give this shared secret to the MapReduce layer to use when talking to Accumulo, to ensure that the client did actually hand that secret to the MapReduce layer, requesting it to do work on its behalf. However, we still need to designate the MapReduce layer as trustable in some way... because this layer could reuse one client's credentials to perform an unauthorized task and give the results to a different user. The whitelist mechanism gives us some assurance that we've vetted that layer to not do those sorts of things. If we already have to designate trust to that intermediate layer, I don't see a lot of added value with the complexity of the delegation token mechanism to prove that it is, in fact, doing work on behalf of a particular client. > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294382#comment-14294382 ] Josh Elser commented on ACCUMULO-3513: -- I'm still unclear of how you think this prevents unwanted impersonation from happening. For mapreduce, the only time that we "know" who a client is happens when they submit the job. We need to tie the fact that the client is who they say they are (from their kerberos credentials) and construct a way to let node managers who no longer have any idea what the job-submitter's credentials are (this is the notion of the delegation token from HDFS and others). In your example, we would have to trust that each and every mapreduce job in the system is going to "do the right thing" and not impersonate users they shouldn't which isn't sufficient for a solution. We can do much better by taking the delegation token approach. > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294368#comment-14294368 ] Christopher Tubbs commented on ACCUMULO-3513: - (Update: basically, I'm suggesting a whitelist for allowed delegators, which would basically include all task-trackers principals) > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294327#comment-14294327 ] Josh Elser commented on ACCUMULO-3513: -- For later if needed: the actual failure seen if you try to run a MR job now. {{noformat}} Error: java.io.IOException: java.lang.IllegalArgumentException: Cannot instantiate org.apache.accumulo.core.client.security.tokens.KerberosToken at org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat.getRecordWriter(AccumuloOutputFormat.java:559) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.IllegalArgumentException: Cannot instantiate org.apache.accumulo.core.client.security.tokens.KerberosToken at org.apache.accumulo.core.client.security.tokens.AuthenticationToken$AuthenticationTokenSerializer.deserialize(AuthenticationToken.java:65) at org.apache.accumulo.core.client.security.tokens.AuthenticationToken$AuthenticationTokenSerializer.deserialize(AuthenticationToken.java:98) at org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase.getAuthenticationToken(ConfiguratorBase.java:229) at org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat.getAuthenticationToken(AccumuloOutputFormat.java:172) at org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat$AccumuloRecordWriter.(AccumuloOutputFormat.java:403) at org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat.getRecordWriter(AccumuloOutputFormat.java:557) ... 8 more Caused by: java.lang.IllegalArgumentException: Subject is not logged in via Kerberos at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) at org.apache.accumulo.core.client.security.tokens.KerberosToken.(KerberosToken.java:53) at org.apache.accumulo.core.client.security.tokens.KerberosToken.(KerberosToken.java:65) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at java.lang.Class.newInstance(Class.java:379) at org.apache.accumulo.core.client.security.tokens.AuthenticationToken$AuthenticationTokenSerializer.deserialize(AuthenticationToken.java:63) ... 13 more {{noformat}} > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289578#comment-14289578 ] Christopher Tubbs commented on ACCUMULO-3513: - A possible solution: extend KerberosToken to have an "isDelegateFor" concept, and if it is constructed as a delegate, then we can use transport's principal to check to see if it is allowed to delegate, then we can use the delegated principal to do any other permissions checks. > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
[ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288302#comment-14288302 ] Josh Elser commented on ACCUMULO-3513: -- Marked as a blocker because there is no "good" way to currently run a mapreduce job when Accumulo is using SASL servers. Need to address this in some fashion for 1.7.0, or, at the absolute minimum, manage user expectations. > Ensure MapReduce functionality with Kerberos enabled > > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > Fix For: 1.7.0 > > > I talked to [~devaraj] today about MapReduce support running on secure Hadoop > to help get a picture about what extra might be needed to make this work. > Generally, in Hadoop and HBase, the client must have valid credentials to > submit a job, then the notion of delegation tokens is used by for further > communication since the servers do not have access to the client's sensitive > information. A centralized service manages creation of a delegation token > which is a record which contains certain information (such as the submitting > user name) necessary to securely identify the holder of the delegation token. > The general idea is that we would need to build support into the master to > manage delegation tokens to node managers to acquire and use to run jobs. > Hadoop and HBase both contain code which implements this general idea, but we > will need to apply them Accumulo and verify that it is M/R jobs still work on > a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)