[ https://issues.apache.org/jira/browse/HADOOP-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464481#comment-13464481 ]
Eli Collins commented on HADOOP-8828: ------------------------------------- I believe this was 2.0 or a trunk build around that time. > Support distcp from secure to insecure clusters > ----------------------------------------------- > > Key: HADOOP-8828 > URL: https://issues.apache.org/jira/browse/HADOOP-8828 > Project: Hadoop Common > Issue Type: Bug > Reporter: Eli Collins > > Users currently can't distcp from secure to insecure clusters. > Relevant background from ATM: > There's no plumbing to make the HFTP client use AuthenticatedURL in the case > security is enabled. This means that even though you have the servlet filter > correctly configured on the server, the client doesn't know how to properly > authenticate to that filter. > The crux of the issue is that security is enabled globally instead of > per-file system. The trick of using HFTP as the source FS works when the > source is insecure, but not the source is secure. > Normal cp with two hdfs:// URL can be made to work. There is indeed logic in > o.a.h.ipc.Client to fall back to using simple authentication if your client > config has security enabled (hadoop.security.authentication set to > "kerberos") and the server responds with a response for simple > authentication. Thing is, there are at least 3 bugs with this that I bumped > into. All three can be worked around. > 1) If your client config has security enabled you *must* have a valid > Kerberos TGT, even if you're interacting with an insecure cluster. The hadoop > client unfortunately tries to read the local ticket cache before it tries to > connect to the server, and so doesn't know that it won't need Kerberos > credentials. > 2) Even though the destination NN is insecure, it has to have a Kerberos > principal created for it. You don't need a keytab, and you don't need to > change any settings on the destination NN. The principal just needs to exist > in the principal database. This is again because the hadoop client will, > before connecting to the remote NN, try to get a service ticket for the > hdfs/f.q.d.n principal for the remote NN. If this fails, it won't even get to > the part where it tries to connect to the insecure NN and falls back to > simple auth. > 3) Once you get through problems 1 and 2, you will try to connect to the > remote, insecure NN. This will work, but the reported principal name of your > user will include a realm that the remote NN doesn't know about. You will > either need to change the default_realm setting in /etc/krb5.conf on the > insecure NN to be the same as the secure NN, or you will need to add some > custom hadoop.security.auth_to_local mappings on the insecure NN so it knows > how to translate this long principal name into a short name. > Even with all these changes, distcp still won't work since the first thing it > tries to do when submitting the job is to get a delegation token for all the > involved NNs, which won't work since the insecure NN isn't running a DT > secret manager. I haven't been able to figure out a way around this, except > to make a custom distcp which doesn't necessarily do this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira