Subroto Sanyal created SPARK-15754:
--------------------------------------

             Summary: org.apache.spark.deploy.yarn.Client changes the 
credential of current user
                 Key: SPARK-15754
                 URL: https://issues.apache.org/jira/browse/SPARK-15754
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.6.1
         Environment: Spark Client with Secured Hadoop Cluster
            Reporter: Subroto Sanyal
            Priority: Critical


h5. Problem
Spawning of SparkContext in Spark-Client mode changes the credentials of 
current user group information. This doesn't let the client (who spawned 
Spark-Context) talk to the Name Node using tgt anymore but, using delegation 
tokens. This is undesirable for any library to change the context of JVM here 
_UserGroupInformation_

h5. Root Cause
Spark creates HDFS Delegation Tokens so that the App master so spawned can 
communicate with Name Node but, during creation of this token Spark adds the 
delegation token to current users credentials as well.
{code:title=org.apache.spark.deploy.yarn.Client.java#createContainerLaunchContext|borderStyle=solid}
    setupSecurityToken(amContainer)
    UserGroupInformation.getCurrentUser().addCredentials(credentials)

    amContainer{code}
With this operation client now always uses delegation token for any further 
communication with Name Node. This scenario becomes dangerous when Resource 
Manager cancels the Delegation Token after 10 minutes of shutting down the 
spark context. This leads to issues on client side like:
{noformat}org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 token (HDFS_DELEGATION_TOKEN token 444 for subroto) can't be found in cache
        at org.apache.hadoop.ipc.Client.call(Client.java:1472)
        at org.apache.hadoop.ipc.Client.call(Client.java:1403)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
        at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
        at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2095)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1214)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1210)
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1210)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1409)
        at Sample.main(Sample.java:85){noformat}

There are other places in code also where we do similar operation like in:
_org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater.updateCredentialsIfRequired()_



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to