[ 
https://issues.apache.org/jira/browse/SPARK-25355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532135#comment-17532135
 ] 

Gabor Somogyi commented on SPARK-25355:
---------------------------------------

After some playground work, code digging and your additional log analysis I see 
what's going on:
 * Spark obtains the already mentioned 3 tokens on submit side
 * Adds them as HADOOP_TOKEN_FILE_LOCATION to the driver
 * Driver starts and here comes the trick
 * UserGroupInformation [loads 
tokens|https://github.com/apache/hadoop/blob/2b9a8c1d3a2caf1e733d57f346af3ff0d5ba529c/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L740-L766]
 in case of loginUser creation so far before proxy user exits (actually this is 
UGI initialization)
 * Later on proxy user created w/ no tokens
 * Finally authentication fails on driver side because no credentials

I've taken a look at the design doc found in 
[https://github.com/mesosphere/spark/pull/26] and it states the following:
!screenshot-1.png! 
The bullet point 7 was maybe true for mesos in 2018 but it's not working w/ K8S 
now for sure.
In the current Spark codebase only executors are using runAsSparkUser but 
driver is not (so runs as proxy user w/o tokens).

So my general opinion considering the facts what we have which may change.
Adding --proxy-user param for K8S was a good idea but:
 * either not tested on cluster at all or tested on a different execution path
 * tested and was working on cluster but after the merge (Mar 17, 2020) 
something has really changed in other parts of the code
 * all in all what I see is that the feature now completely broken

[~pedro.rossi] any comments because according to the latest facts this is a 
feature blocker?


> Support --proxy-user for Spark on K8s
> -------------------------------------
>
>                 Key: SPARK-25355
>                 URL: https://issues.apache.org/jira/browse/SPARK-25355
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Kubernetes, Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Stavros Kontopoulos
>            Assignee: Pedro Rossi
>            Priority: Major
>             Fix For: 3.1.0
>
>         Attachments: client.log, driver.log, screenshot-1.png, 
> with_proxy_extradebugLogs.log
>
>
> SPARK-23257 adds kerberized hdfs support for Spark on K8s. A major addition 
> needed is the support for proxy user. A proxy user is impersonated by a 
> superuser who executes operations on behalf of the proxy user. More on this: 
> [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]
> [https://github.com/spark-notebook/spark-notebook/blob/master/docs/proxyuser_impersonation.md]
> This has been implemented for Yarn upstream and Spark on Mesos here:
> [https://github.com/mesosphere/spark/pull/26]
> [~ifilonenko] creating this issue according to our discussion.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to