Subroto Sanyal created SPARK-15754: -------------------------------------- Summary: org.apache.spark.deploy.yarn.Client changes the credential of current user Key: SPARK-15754 URL: https://issues.apache.org/jira/browse/SPARK-15754 Project: Spark Issue Type: Bug Affects Versions: 1.6.1 Environment: Spark Client with Secured Hadoop Cluster Reporter: Subroto Sanyal Priority: Critical
h5. Problem Spawning of SparkContext in Spark-Client mode changes the credentials of current user group information. This doesn't let the client (who spawned Spark-Context) talk to the Name Node using tgt anymore but, using delegation tokens. This is undesirable for any library to change the context of JVM here _UserGroupInformation_ h5. Root Cause Spark creates HDFS Delegation Tokens so that the App master so spawned can communicate with Name Node but, during creation of this token Spark adds the delegation token to current users credentials as well. {code:title=org.apache.spark.deploy.yarn.Client.java#createContainerLaunchContext|borderStyle=solid} setupSecurityToken(amContainer) UserGroupInformation.getCurrentUser().addCredentials(credentials) amContainer{code} With this operation client now always uses delegation token for any further communication with Name Node. This scenario becomes dangerous when Resource Manager cancels the Delegation Token after 10 minutes of shutting down the spark context. This leads to issues on client side like: {noformat}org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 444 for subroto) can't be found in cache at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1403) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2095) at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1214) at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1210) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1210) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1409) at Sample.main(Sample.java:85){noformat} There are other places in code also where we do similar operation like in: _org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater.updateCredentialsIfRequired()_ -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org