[ 
https://issues.apache.org/jira/browse/SPARK-25355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531523#comment-17531523
 ] 

Gabor Somogyi edited comment on SPARK-25355 at 5/4/22 8:00 AM:
---------------------------------------------------------------

After the attached logs now I see more.

HADOOP_TOKEN_FILE_LOCATION with proxy user is never worked and it remains this.
You have guys 2 options:
 * You provide tokens in HADOOP_TOKEN_FILE_LOCATION: this case UGI picks up the 
tokens for the current user and does authentication w/ that. Nothing blocks you 
guys that these tokens are generated for the proxy user manually from your 
custom code. This case --proxy-user config is not needed and will work like 
charm.
 * You set --proxy-user config and such case Spark obtains token for the proxy 
user authenticating w/ the real user Kerberos credentials. When I take a look 
at the logs Spark tries to obtain tokens for the following external service 
types
{code:java}
22/05/04 04:13:07 DEBUG HadoopDelegationTokenManager: Using the following 
builtin delegation token providers: hadoopfs, hbase, hive.
22/05/04 04:13:07 DEBUG UserGroupInformation: PrivilegedAction as:proxyUser 
(auth:PROXY) via <user>/<t...@domain.com> (auth:KERBEROS) 
from:org.apache.spark.deploy.security.HadoopDelegationTokenManager.obtainDelegationTokens(HadoopDelegationTokenManager.scala:146)
{code}
After a while Spark's build-in Hadoop FS delegation token provider kicks-in and 
tries to obtain a token as expected:
{code:java}
22/05/04 04:13:07 DEBUG HadoopFSDelegationTokenProvider: Delegation token 
renewer is: proxyUser
22/05/04 04:13:07 INFO HadoopFSDelegationTokenProvider: getting token for: 
DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1812449855_1, ugi=proxyUser 
(auth:PROXY) via <user>/<t...@domain.com> (auth:KERBEROS)]] with renewer 
proxyUser
22/05/04 04:13:07 DEBUG Client: IPC Client (1939869193) connection to 
nn.com/<ip>:8020 from proxyUser sending #6 
org.apache.hadoop.hdfs.protocol.ClientProtocol.getDelegationToken
22/05/04 04:13:07 DEBUG Client: IPC Client (1939869193) connection to 
nn.com/<ip>:8020 from proxyUser got value #6
22/05/04 04:13:07 DEBUG ProtobufRpcEngine: Call: getDelegationToken took 2ms
22/05/04 04:13:07 INFO DFSClient: Created token for proxyUser: 
HDFS_DELEGATION_TOKEN owner=proxyUser, renewer=proxyUser, 
realUser=<user>/<t...@domain.com>, issueDate=1651637587347, 
maxDate=1652242387347, sequenceNumber=183545, masterKeyId=606 on ha-hdfs:<hdfs>
22/05/04 04:13:07 DEBUG Client: IPC Client (1939869193) connection to 
nn.com/<ip>:8020 from proxyUser sending #7 
org.apache.hadoop.hdfs.protocol.ClientProtocol.getServerDefaults
22/05/04 04:13:07 DEBUG Client: IPC Client (1939869193) connection to 
nn.com/<ip>:8020 from proxyUser got value #7
22/05/04 04:13:07 DEBUG ProtobufRpcEngine: Call: getServerDefaults took 0ms
22/05/04 04:13:07 DEBUG KMSClientProvider: KMSClientProvider for KMS url: 
http://nn.com:9292/kms/v1/ delegation token service: <ip>:9292 created.
22/05/04 04:13:07 DEBUG KMSClientProvider: KMSClientProvider for KMS url: 
http://<nn2.com>:9292/kms/v1/ delegation token service: 10.207.184.25:9292 
created.
22/05/04 04:13:07 DEBUG KMSClientProvider: Current UGI: proxyUser (auth:PROXY) 
via <user>/<t...@domain.com> (auth:KERBEROS)
22/05/04 04:13:07 DEBUG KMSClientProvider: Real UGI: <user>/<t...@domain.com> 
(auth:KERBEROS)
22/05/04 04:13:07 DEBUG KMSClientProvider: Login UGI: <user>/<t...@domain.com> 
(auth:KERBEROS)
22/05/04 04:13:07 DEBUG UserGroupInformation: PrivilegedAction 
as:<user>/<t...@domain.com> (auth:KERBEROS) 
from:org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:1037)
22/05/04 04:13:07 DEBUG KMSClientProvider: Getting new token from 
http://nn.com:9292/kms/v1/, renewer:proxyUser
22/05/04 04:13:07 DEBUG DelegationTokenAuthenticator: No delegation token found 
for 
url=http://nn.com:9292/kms/v1/?op=GETDELEGATIONTOKEN&doAs=proxyUser&renewer=proxyUser,
 token=, authenticating with class 
org.apache.hadoop.security.token.delegation.web.KerberosDelegationTokenAuthenticator$1
22/05/04 04:13:07 DEBUG KerberosAuthenticator: JDK performed authentication on 
our behalf.
22/05/04 04:13:07 DEBUG AuthenticatedURL: Cannot parse cookie header: 
java.lang.IllegalArgumentException: Empty cookie header string
        at java.net.HttpCookie.parseInternal(HttpCookie.java:826)
        at java.net.HttpCookie.parse(HttpCookie.java:202)
        at java.net.HttpCookie.parse(HttpCookie.java:178)
        at 
org.apache.hadoop.security.authentication.client.AuthenticatedURL$AuthCookieHandler.put(AuthenticatedURL.java:99)
        at 
org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:390)
        at 
org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:196)
        at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147)
        at 
org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:348)
        at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:321)
        at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:193)
        at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:384)
        at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider$4.run(KMSClientProvider.java:1043)
        at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider$4.run(KMSClientProvider.java:1037)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
        at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:1037)
        at 
org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:193)
        at 
org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:190)
        at 
org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:123)
        at 
org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.addDelegationTokens(LoadBalancingKMSClientProvider.java:190)
        at 
org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:110)
        at 
org.apache.hadoop.hdfs.HdfsKMSUtil.addDelegationTokensForKeyProvider(HdfsKMSUtil.java:84)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2821)
        at 
org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider.$anonfun$fetchDelegationTokens$1(HadoopFSDelegationTokenProvider.scala:117)
        at scala.collection.immutable.Set$Set1.foreach(Set.scala:141)
        at 
org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider.fetchDelegationTokens(HadoopFSDelegationTokenProvider.scala:110)
        at 
org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider.obtainDelegationTokens(HadoopFSDelegationTokenProvider.scala:56)
...
{code}
As one can see the token obtain fails which is retried several times but failed 
similar way. As a result the proxy user UGI will not contain any HDFS token! 
That is the main reason of the AccessControlException.

Just for the completeness:

HBase token obtain was skipped because it's not on the classpath:
{code:java}
java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
{code}
Hive token obtain was successful:
{code:java}
22/05/04 04:13:09 DEBUG HiveDelegationTokenProvider: Get Token from hive 
metastore: Kind: HIVE_DELEGATION_TOKEN, Service: , Ident: 00 08 73 68 72 70 72 
61 73 61 04 68 69 76 65 1e 6c 69 76 79 2f 6c 69 76 79 2d 69 6e 74 40 43 4f 52 
50 44 45 56 2e 56 49 53 41 2e 43 4f 4d 8a 01 80 8d 45 94 0c 8a 01 80 d5 5e 9c 
0c 8e 27 19 8e 03 70
{code}
All in all in the first glance Spark behaves as expected and here are my 
options to solve the issue:
 * Solve the issue within Hadoop library (IllegalArgumentException: Empty 
cookie header string) while Spark obtains HDFS token. This case 
HADOOP_TOKEN_FILE_LOCATION is useless because it loads tokens into the real 
user's UGI (tokens in the real user are never used for proxy user token obtain).
 * Remove --proxy-user arg, create proxy user tokens manually, store it in a 
file and use HADOOP_TOKEN_FILE_LOCATION


was (Author: gaborgsomogyi):
After the attached logs now I see more.

HADOOP_TOKEN_FILE_LOCATION with proxy user is never worked and it remains this.
You have guys 2 options:
 * You provide tokens in HADOOP_TOKEN_FILE_LOCATION: this case UGI picks up the 
tokens for the current user and does authentication w/ that. Nothing blocks you 
guys that these tokens are generated for the proxy user manually from your 
custom code. This case --proxy-user config is not needed and will work like 
charm.
 * You set --proxy-user config and such case Spark obtains token for the proxy 
user authenticating w/ the real user Kerberos credentials. When I take a look 
at the logs Spark tries to obtain tokens for the following external service 
types
{code:java}
22/05/04 04:13:07 DEBUG HadoopDelegationTokenManager: Using the following 
builtin delegation token providers: hadoopfs, hbase, hive.
22/05/04 04:13:07 DEBUG UserGroupInformation: PrivilegedAction as:proxyUser 
(auth:PROXY) via <user>/<t...@domain.com> (auth:KERBEROS) 
from:org.apache.spark.deploy.security.HadoopDelegationTokenManager.obtainDelegationTokens(HadoopDelegationTokenManager.scala:146)
{code}
After a while Spark's build-in Hadoop FS delegation token provider kicks-in and 
tries to obtain a token as expected:
{code:java}
22/05/04 04:13:07 DEBUG HadoopFSDelegationTokenProvider: Delegation token 
renewer is: proxyUser
22/05/04 04:13:07 INFO HadoopFSDelegationTokenProvider: getting token for: 
DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1812449855_1, ugi=proxyUser 
(auth:PROXY) via <user>/<t...@domain.com> (auth:KERBEROS)]] with renewer 
proxyUser
22/05/04 04:13:07 DEBUG Client: IPC Client (1939869193) connection to 
nn.com/<ip>:8020 from proxyUser sending #6 
org.apache.hadoop.hdfs.protocol.ClientProtocol.getDelegationToken
22/05/04 04:13:07 DEBUG Client: IPC Client (1939869193) connection to 
nn.com/<ip>:8020 from proxyUser got value #6
22/05/04 04:13:07 DEBUG ProtobufRpcEngine: Call: getDelegationToken took 2ms
22/05/04 04:13:07 INFO DFSClient: Created token for proxyUser: 
HDFS_DELEGATION_TOKEN owner=proxyUser, renewer=proxyUser, 
realUser=<user>/<t...@domain.com>, issueDate=1651637587347, 
maxDate=1652242387347, sequenceNumber=183545, masterKeyId=606 on ha-hdfs:<hdfs>
22/05/04 04:13:07 DEBUG Client: IPC Client (1939869193) connection to 
nn.com/<ip>:8020 from proxyUser sending #7 
org.apache.hadoop.hdfs.protocol.ClientProtocol.getServerDefaults
22/05/04 04:13:07 DEBUG Client: IPC Client (1939869193) connection to 
nn.com/<ip>:8020 from proxyUser got value #7
22/05/04 04:13:07 DEBUG ProtobufRpcEngine: Call: getServerDefaults took 0ms
22/05/04 04:13:07 DEBUG KMSClientProvider: KMSClientProvider for KMS url: 
http://nn.com:9292/kms/v1/ delegation token service: <ip>:9292 created.
22/05/04 04:13:07 DEBUG KMSClientProvider: KMSClientProvider for KMS url: 
http://<nn2.com>:9292/kms/v1/ delegation token service: 10.207.184.25:9292 
created.
22/05/04 04:13:07 DEBUG KMSClientProvider: Current UGI: proxyUser (auth:PROXY) 
via <user>/<t...@domain.com> (auth:KERBEROS)
22/05/04 04:13:07 DEBUG KMSClientProvider: Real UGI: <user>/<t...@domain.com> 
(auth:KERBEROS)
22/05/04 04:13:07 DEBUG KMSClientProvider: Login UGI: <user>/<t...@domain.com> 
(auth:KERBEROS)
22/05/04 04:13:07 DEBUG UserGroupInformation: PrivilegedAction 
as:<user>/<t...@domain.com> (auth:KERBEROS) 
from:org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:1037)
22/05/04 04:13:07 DEBUG KMSClientProvider: Getting new token from 
http://nn.com:9292/kms/v1/, renewer:proxyUser
22/05/04 04:13:07 DEBUG DelegationTokenAuthenticator: No delegation token found 
for 
url=http://nn.com:9292/kms/v1/?op=GETDELEGATIONTOKEN&doAs=proxyUser&renewer=proxyUser,
 token=, authenticating with class 
org.apache.hadoop.security.token.delegation.web.KerberosDelegationTokenAuthenticator$1
22/05/04 04:13:07 DEBUG KerberosAuthenticator: JDK performed authentication on 
our behalf.
22/05/04 04:13:07 DEBUG AuthenticatedURL: Cannot parse cookie header: 
java.lang.IllegalArgumentException: Empty cookie header string
        at java.net.HttpCookie.parseInternal(HttpCookie.java:826)
        at java.net.HttpCookie.parse(HttpCookie.java:202)
        at java.net.HttpCookie.parse(HttpCookie.java:178)
        at 
org.apache.hadoop.security.authentication.client.AuthenticatedURL$AuthCookieHandler.put(AuthenticatedURL.java:99)
        at 
org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:390)
        at 
org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:196)
        at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147)
        at 
org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:348)
        at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:321)
        at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:193)
        at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:384)
        at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider$4.run(KMSClientProvider.java:1043)
        at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider$4.run(KMSClientProvider.java:1037)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
        at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:1037)
        at 
org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:193)
        at 
org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:190)
        at 
org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:123)
        at 
org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.addDelegationTokens(LoadBalancingKMSClientProvider.java:190)
        at 
org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:110)
        at 
org.apache.hadoop.hdfs.HdfsKMSUtil.addDelegationTokensForKeyProvider(HdfsKMSUtil.java:84)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2821)
        at 
org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider.$anonfun$fetchDelegationTokens$1(HadoopFSDelegationTokenProvider.scala:117)
        at scala.collection.immutable.Set$Set1.foreach(Set.scala:141)
        at 
org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider.fetchDelegationTokens(HadoopFSDelegationTokenProvider.scala:110)
        at 
org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider.obtainDelegationTokens(HadoopFSDelegationTokenProvider.scala:56)
...
{code}
As one can see the token obtain fails which is retried several times but failed 
similar way. As a result the proxy user UGI will not contain any HDFS token! 
That is the main reason of the AccessControlException.

Just for the completeness:

HBase token obtain was skipped because it's not on the classpath:
{code:java}
java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
{code}
Hive token obtain was successful:
{code:java}
22/05/04 04:13:09 DEBUG HiveDelegationTokenProvider: Get Token from hive 
metastore: Kind: HIVE_DELEGATION_TOKEN, Service: , Ident: 00 08 73 68 72 70 72 
61 73 61 04 68 69 76 65 1e 6c 69 76 79 2f 6c 69 76 79 2d 69 6e 74 40 43 4f 52 
50 44 45 56 2e 56 49 53 41 2e 43 4f 4d 8a 01 80 8d 45 94 0c 8a 01 80 d5 5e 9c 
0c 8e 27 19 8e 03 70
{code}
All in all in the first glance Spark behaves as expected and here are my 
suggestions:
 * Solve the issue within Hadoop library (IllegalArgumentException: Empty 
cookie header string) while Spark obtains HDFS token. This case 
HADOOP_TOKEN_FILE_LOCATION is useless because it loads tokens into the real 
user's UGI (tokens in the real user are never used for proxy user token obtain).
 * Remove --proxy-user arg, create proxy user tokens manually, store it in a 
file and use HADOOP_TOKEN_FILE_LOCATION

> Support --proxy-user for Spark on K8s
> -------------------------------------
>
>                 Key: SPARK-25355
>                 URL: https://issues.apache.org/jira/browse/SPARK-25355
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Kubernetes, Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Stavros Kontopoulos
>            Assignee: Pedro Rossi
>            Priority: Major
>             Fix For: 3.1.0
>
>         Attachments: client.log, driver.log
>
>
> SPARK-23257 adds kerberized hdfs support for Spark on K8s. A major addition 
> needed is the support for proxy user. A proxy user is impersonated by a 
> superuser who executes operations on behalf of the proxy user. More on this: 
> [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]
> [https://github.com/spark-notebook/spark-notebook/blob/master/docs/proxyuser_impersonation.md]
> This has been implemented for Yarn upstream and Spark on Mesos here:
> [https://github.com/mesosphere/spark/pull/26]
> [~ifilonenko] creating this issue according to our discussion.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to