[ https://issues.apache.org/jira/browse/SPARK-25355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531523#comment-17531523 ]
Gabor Somogyi edited comment on SPARK-25355 at 5/4/22 8:00 AM: --------------------------------------------------------------- After the attached logs now I see more. HADOOP_TOKEN_FILE_LOCATION with proxy user is never worked and it remains this. You have guys 2 options: * You provide tokens in HADOOP_TOKEN_FILE_LOCATION: this case UGI picks up the tokens for the current user and does authentication w/ that. Nothing blocks you guys that these tokens are generated for the proxy user manually from your custom code. This case --proxy-user config is not needed and will work like charm. * You set --proxy-user config and such case Spark obtains token for the proxy user authenticating w/ the real user Kerberos credentials. When I take a look at the logs Spark tries to obtain tokens for the following external service types {code:java} 22/05/04 04:13:07 DEBUG HadoopDelegationTokenManager: Using the following builtin delegation token providers: hadoopfs, hbase, hive. 22/05/04 04:13:07 DEBUG UserGroupInformation: PrivilegedAction as:proxyUser (auth:PROXY) via <user>/<t...@domain.com> (auth:KERBEROS) from:org.apache.spark.deploy.security.HadoopDelegationTokenManager.obtainDelegationTokens(HadoopDelegationTokenManager.scala:146) {code} After a while Spark's build-in Hadoop FS delegation token provider kicks-in and tries to obtain a token as expected: {code:java} 22/05/04 04:13:07 DEBUG HadoopFSDelegationTokenProvider: Delegation token renewer is: proxyUser 22/05/04 04:13:07 INFO HadoopFSDelegationTokenProvider: getting token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1812449855_1, ugi=proxyUser (auth:PROXY) via <user>/<t...@domain.com> (auth:KERBEROS)]] with renewer proxyUser 22/05/04 04:13:07 DEBUG Client: IPC Client (1939869193) connection to nn.com/<ip>:8020 from proxyUser sending #6 org.apache.hadoop.hdfs.protocol.ClientProtocol.getDelegationToken 22/05/04 04:13:07 DEBUG Client: IPC Client (1939869193) connection to nn.com/<ip>:8020 from proxyUser got value #6 22/05/04 04:13:07 DEBUG ProtobufRpcEngine: Call: getDelegationToken took 2ms 22/05/04 04:13:07 INFO DFSClient: Created token for proxyUser: HDFS_DELEGATION_TOKEN owner=proxyUser, renewer=proxyUser, realUser=<user>/<t...@domain.com>, issueDate=1651637587347, maxDate=1652242387347, sequenceNumber=183545, masterKeyId=606 on ha-hdfs:<hdfs> 22/05/04 04:13:07 DEBUG Client: IPC Client (1939869193) connection to nn.com/<ip>:8020 from proxyUser sending #7 org.apache.hadoop.hdfs.protocol.ClientProtocol.getServerDefaults 22/05/04 04:13:07 DEBUG Client: IPC Client (1939869193) connection to nn.com/<ip>:8020 from proxyUser got value #7 22/05/04 04:13:07 DEBUG ProtobufRpcEngine: Call: getServerDefaults took 0ms 22/05/04 04:13:07 DEBUG KMSClientProvider: KMSClientProvider for KMS url: http://nn.com:9292/kms/v1/ delegation token service: <ip>:9292 created. 22/05/04 04:13:07 DEBUG KMSClientProvider: KMSClientProvider for KMS url: http://<nn2.com>:9292/kms/v1/ delegation token service: 10.207.184.25:9292 created. 22/05/04 04:13:07 DEBUG KMSClientProvider: Current UGI: proxyUser (auth:PROXY) via <user>/<t...@domain.com> (auth:KERBEROS) 22/05/04 04:13:07 DEBUG KMSClientProvider: Real UGI: <user>/<t...@domain.com> (auth:KERBEROS) 22/05/04 04:13:07 DEBUG KMSClientProvider: Login UGI: <user>/<t...@domain.com> (auth:KERBEROS) 22/05/04 04:13:07 DEBUG UserGroupInformation: PrivilegedAction as:<user>/<t...@domain.com> (auth:KERBEROS) from:org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:1037) 22/05/04 04:13:07 DEBUG KMSClientProvider: Getting new token from http://nn.com:9292/kms/v1/, renewer:proxyUser 22/05/04 04:13:07 DEBUG DelegationTokenAuthenticator: No delegation token found for url=http://nn.com:9292/kms/v1/?op=GETDELEGATIONTOKEN&doAs=proxyUser&renewer=proxyUser, token=, authenticating with class org.apache.hadoop.security.token.delegation.web.KerberosDelegationTokenAuthenticator$1 22/05/04 04:13:07 DEBUG KerberosAuthenticator: JDK performed authentication on our behalf. 22/05/04 04:13:07 DEBUG AuthenticatedURL: Cannot parse cookie header: java.lang.IllegalArgumentException: Empty cookie header string at java.net.HttpCookie.parseInternal(HttpCookie.java:826) at java.net.HttpCookie.parse(HttpCookie.java:202) at java.net.HttpCookie.parse(HttpCookie.java:178) at org.apache.hadoop.security.authentication.client.AuthenticatedURL$AuthCookieHandler.put(AuthenticatedURL.java:99) at org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:390) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:196) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147) at org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:348) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:321) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:193) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:384) at org.apache.hadoop.crypto.key.kms.KMSClientProvider$4.run(KMSClientProvider.java:1043) at org.apache.hadoop.crypto.key.kms.KMSClientProvider$4.run(KMSClientProvider.java:1037) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:1037) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:193) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:190) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:123) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.addDelegationTokens(LoadBalancingKMSClientProvider.java:190) at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:110) at org.apache.hadoop.hdfs.HdfsKMSUtil.addDelegationTokensForKeyProvider(HdfsKMSUtil.java:84) at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2821) at org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider.$anonfun$fetchDelegationTokens$1(HadoopFSDelegationTokenProvider.scala:117) at scala.collection.immutable.Set$Set1.foreach(Set.scala:141) at org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider.fetchDelegationTokens(HadoopFSDelegationTokenProvider.scala:110) at org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider.obtainDelegationTokens(HadoopFSDelegationTokenProvider.scala:56) ... {code} As one can see the token obtain fails which is retried several times but failed similar way. As a result the proxy user UGI will not contain any HDFS token! That is the main reason of the AccessControlException. Just for the completeness: HBase token obtain was skipped because it's not on the classpath: {code:java} java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration {code} Hive token obtain was successful: {code:java} 22/05/04 04:13:09 DEBUG HiveDelegationTokenProvider: Get Token from hive metastore: Kind: HIVE_DELEGATION_TOKEN, Service: , Ident: 00 08 73 68 72 70 72 61 73 61 04 68 69 76 65 1e 6c 69 76 79 2f 6c 69 76 79 2d 69 6e 74 40 43 4f 52 50 44 45 56 2e 56 49 53 41 2e 43 4f 4d 8a 01 80 8d 45 94 0c 8a 01 80 d5 5e 9c 0c 8e 27 19 8e 03 70 {code} All in all in the first glance Spark behaves as expected and here are my options to solve the issue: * Solve the issue within Hadoop library (IllegalArgumentException: Empty cookie header string) while Spark obtains HDFS token. This case HADOOP_TOKEN_FILE_LOCATION is useless because it loads tokens into the real user's UGI (tokens in the real user are never used for proxy user token obtain). * Remove --proxy-user arg, create proxy user tokens manually, store it in a file and use HADOOP_TOKEN_FILE_LOCATION was (Author: gaborgsomogyi): After the attached logs now I see more. HADOOP_TOKEN_FILE_LOCATION with proxy user is never worked and it remains this. You have guys 2 options: * You provide tokens in HADOOP_TOKEN_FILE_LOCATION: this case UGI picks up the tokens for the current user and does authentication w/ that. Nothing blocks you guys that these tokens are generated for the proxy user manually from your custom code. This case --proxy-user config is not needed and will work like charm. * You set --proxy-user config and such case Spark obtains token for the proxy user authenticating w/ the real user Kerberos credentials. When I take a look at the logs Spark tries to obtain tokens for the following external service types {code:java} 22/05/04 04:13:07 DEBUG HadoopDelegationTokenManager: Using the following builtin delegation token providers: hadoopfs, hbase, hive. 22/05/04 04:13:07 DEBUG UserGroupInformation: PrivilegedAction as:proxyUser (auth:PROXY) via <user>/<t...@domain.com> (auth:KERBEROS) from:org.apache.spark.deploy.security.HadoopDelegationTokenManager.obtainDelegationTokens(HadoopDelegationTokenManager.scala:146) {code} After a while Spark's build-in Hadoop FS delegation token provider kicks-in and tries to obtain a token as expected: {code:java} 22/05/04 04:13:07 DEBUG HadoopFSDelegationTokenProvider: Delegation token renewer is: proxyUser 22/05/04 04:13:07 INFO HadoopFSDelegationTokenProvider: getting token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1812449855_1, ugi=proxyUser (auth:PROXY) via <user>/<t...@domain.com> (auth:KERBEROS)]] with renewer proxyUser 22/05/04 04:13:07 DEBUG Client: IPC Client (1939869193) connection to nn.com/<ip>:8020 from proxyUser sending #6 org.apache.hadoop.hdfs.protocol.ClientProtocol.getDelegationToken 22/05/04 04:13:07 DEBUG Client: IPC Client (1939869193) connection to nn.com/<ip>:8020 from proxyUser got value #6 22/05/04 04:13:07 DEBUG ProtobufRpcEngine: Call: getDelegationToken took 2ms 22/05/04 04:13:07 INFO DFSClient: Created token for proxyUser: HDFS_DELEGATION_TOKEN owner=proxyUser, renewer=proxyUser, realUser=<user>/<t...@domain.com>, issueDate=1651637587347, maxDate=1652242387347, sequenceNumber=183545, masterKeyId=606 on ha-hdfs:<hdfs> 22/05/04 04:13:07 DEBUG Client: IPC Client (1939869193) connection to nn.com/<ip>:8020 from proxyUser sending #7 org.apache.hadoop.hdfs.protocol.ClientProtocol.getServerDefaults 22/05/04 04:13:07 DEBUG Client: IPC Client (1939869193) connection to nn.com/<ip>:8020 from proxyUser got value #7 22/05/04 04:13:07 DEBUG ProtobufRpcEngine: Call: getServerDefaults took 0ms 22/05/04 04:13:07 DEBUG KMSClientProvider: KMSClientProvider for KMS url: http://nn.com:9292/kms/v1/ delegation token service: <ip>:9292 created. 22/05/04 04:13:07 DEBUG KMSClientProvider: KMSClientProvider for KMS url: http://<nn2.com>:9292/kms/v1/ delegation token service: 10.207.184.25:9292 created. 22/05/04 04:13:07 DEBUG KMSClientProvider: Current UGI: proxyUser (auth:PROXY) via <user>/<t...@domain.com> (auth:KERBEROS) 22/05/04 04:13:07 DEBUG KMSClientProvider: Real UGI: <user>/<t...@domain.com> (auth:KERBEROS) 22/05/04 04:13:07 DEBUG KMSClientProvider: Login UGI: <user>/<t...@domain.com> (auth:KERBEROS) 22/05/04 04:13:07 DEBUG UserGroupInformation: PrivilegedAction as:<user>/<t...@domain.com> (auth:KERBEROS) from:org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:1037) 22/05/04 04:13:07 DEBUG KMSClientProvider: Getting new token from http://nn.com:9292/kms/v1/, renewer:proxyUser 22/05/04 04:13:07 DEBUG DelegationTokenAuthenticator: No delegation token found for url=http://nn.com:9292/kms/v1/?op=GETDELEGATIONTOKEN&doAs=proxyUser&renewer=proxyUser, token=, authenticating with class org.apache.hadoop.security.token.delegation.web.KerberosDelegationTokenAuthenticator$1 22/05/04 04:13:07 DEBUG KerberosAuthenticator: JDK performed authentication on our behalf. 22/05/04 04:13:07 DEBUG AuthenticatedURL: Cannot parse cookie header: java.lang.IllegalArgumentException: Empty cookie header string at java.net.HttpCookie.parseInternal(HttpCookie.java:826) at java.net.HttpCookie.parse(HttpCookie.java:202) at java.net.HttpCookie.parse(HttpCookie.java:178) at org.apache.hadoop.security.authentication.client.AuthenticatedURL$AuthCookieHandler.put(AuthenticatedURL.java:99) at org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:390) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:196) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147) at org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:348) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:321) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:193) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:384) at org.apache.hadoop.crypto.key.kms.KMSClientProvider$4.run(KMSClientProvider.java:1043) at org.apache.hadoop.crypto.key.kms.KMSClientProvider$4.run(KMSClientProvider.java:1037) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:1037) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:193) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:190) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:123) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.addDelegationTokens(LoadBalancingKMSClientProvider.java:190) at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:110) at org.apache.hadoop.hdfs.HdfsKMSUtil.addDelegationTokensForKeyProvider(HdfsKMSUtil.java:84) at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2821) at org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider.$anonfun$fetchDelegationTokens$1(HadoopFSDelegationTokenProvider.scala:117) at scala.collection.immutable.Set$Set1.foreach(Set.scala:141) at org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider.fetchDelegationTokens(HadoopFSDelegationTokenProvider.scala:110) at org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider.obtainDelegationTokens(HadoopFSDelegationTokenProvider.scala:56) ... {code} As one can see the token obtain fails which is retried several times but failed similar way. As a result the proxy user UGI will not contain any HDFS token! That is the main reason of the AccessControlException. Just for the completeness: HBase token obtain was skipped because it's not on the classpath: {code:java} java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration {code} Hive token obtain was successful: {code:java} 22/05/04 04:13:09 DEBUG HiveDelegationTokenProvider: Get Token from hive metastore: Kind: HIVE_DELEGATION_TOKEN, Service: , Ident: 00 08 73 68 72 70 72 61 73 61 04 68 69 76 65 1e 6c 69 76 79 2f 6c 69 76 79 2d 69 6e 74 40 43 4f 52 50 44 45 56 2e 56 49 53 41 2e 43 4f 4d 8a 01 80 8d 45 94 0c 8a 01 80 d5 5e 9c 0c 8e 27 19 8e 03 70 {code} All in all in the first glance Spark behaves as expected and here are my suggestions: * Solve the issue within Hadoop library (IllegalArgumentException: Empty cookie header string) while Spark obtains HDFS token. This case HADOOP_TOKEN_FILE_LOCATION is useless because it loads tokens into the real user's UGI (tokens in the real user are never used for proxy user token obtain). * Remove --proxy-user arg, create proxy user tokens manually, store it in a file and use HADOOP_TOKEN_FILE_LOCATION > Support --proxy-user for Spark on K8s > ------------------------------------- > > Key: SPARK-25355 > URL: https://issues.apache.org/jira/browse/SPARK-25355 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Spark Core > Affects Versions: 3.1.0 > Reporter: Stavros Kontopoulos > Assignee: Pedro Rossi > Priority: Major > Fix For: 3.1.0 > > Attachments: client.log, driver.log > > > SPARK-23257 adds kerberized hdfs support for Spark on K8s. A major addition > needed is the support for proxy user. A proxy user is impersonated by a > superuser who executes operations on behalf of the proxy user. More on this: > [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html] > [https://github.com/spark-notebook/spark-notebook/blob/master/docs/proxyuser_impersonation.md] > This has been implemented for Yarn upstream and Spark on Mesos here: > [https://github.com/mesosphere/spark/pull/26] > [~ifilonenko] creating this issue according to our discussion. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org