[ 
https://issues.apache.org/jira/browse/SPARK-25355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530678#comment-17530678
 ] 

Gabor Somogyi edited comment on SPARK-25355 at 5/2/22 11:32 AM:
----------------------------------------------------------------

Guys, when I take a look at the logs and hear what you say honestly not fully 
understand what you do :)

You're telling that you do kinit which creates a TGT in the users credentials 
cache on the local machine. Please be aware that this TGT is NOT transferred by 
default to the cluster.
On the other hand the driver is reading credentials from file:
{code:java}
...
22/04/26 08:54:39 DEBUG UserGroupInformation: Loaded 3 tokens
22/04/26 08:54:39 DEBUG UserGroupInformation: UGI loginUser:185 (auth:SIMPLE)
22/04/26 08:54:39 DEBUG UserGroupInformation: PrivilegedAction as:shrprasa 
(auth:PROXY) via 185 (auth:SIMPLE) 
...
22/04/26 08:54:38 DEBUG UserGroupInformation: Reading credentials from location 
set in HADOOP_TOKEN_FILE_LOCATION: 
/mnt/secrets/hadoop-credentials/..2022_04_26_08_54_34.1262645511/hadoop-tokens
...
{code}

One can authenticate from both credentials (TGT and HADOOP_TOKEN_FILE_LOCATION) 
so which one is the plan and which one is a side effect?

As a general suggestion client mode kerberos authentication suffers from many 
issue especially with TGT so not advised. If you want a peaceful life then I 
warmly suggest to use keytab :)



was (Author: gaborgsomogyi):
Guys, when I take a look at the logs and hear what you say honestly not fully 
understand what you do :)

You're telling that you do kinit which creates a TGT in the users credentials 
cache on the local machine. Please be aware that this TGT is NOT transferred by 
default to the cluster.
On the other hand the driver is reading credentials from file:
{code:java}
...
22/04/26 08:54:38 DEBUG UserGroupInformation: Reading credentials from location 
set in HADOOP_TOKEN_FILE_LOCATION: 
/mnt/secrets/hadoop-credentials/..2022_04_26_08_54_34.1262645511/hadoop-tokens
...
{code}

One can authenticate from both credentials (TGT and HADOOP_TOKEN_FILE_LOCATION) 
so which one is the plan and which one is a side effect?

As a general suggestion client mode kerberos authentication suffers from many 
issue especially with TGT so not advised. If you want a peaceful life then I 
warmly suggest to use keytab :)


> Support --proxy-user for Spark on K8s
> -------------------------------------
>
>                 Key: SPARK-25355
>                 URL: https://issues.apache.org/jira/browse/SPARK-25355
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Kubernetes, Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Stavros Kontopoulos
>            Assignee: Pedro Rossi
>            Priority: Major
>             Fix For: 3.1.0
>
>
> SPARK-23257 adds kerberized hdfs support for Spark on K8s. A major addition 
> needed is the support for proxy user. A proxy user is impersonated by a 
> superuser who executes operations on behalf of the proxy user. More on this: 
> [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]
> [https://github.com/spark-notebook/spark-notebook/blob/master/docs/proxyuser_impersonation.md]
> This has been implemented for Yarn upstream and Spark on Mesos here:
> [https://github.com/mesosphere/spark/pull/26]
> [~ifilonenko] creating this issue according to our discussion.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to