[ https://issues.apache.org/jira/browse/SPARK-25355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532135#comment-17532135 ]
Gabor Somogyi commented on SPARK-25355: --------------------------------------- After some playground work, code digging and your additional log analysis I see what's going on: * Spark obtains the already mentioned 3 tokens on submit side * Adds them as HADOOP_TOKEN_FILE_LOCATION to the driver * Driver starts and here comes the trick * UserGroupInformation [loads tokens|https://github.com/apache/hadoop/blob/2b9a8c1d3a2caf1e733d57f346af3ff0d5ba529c/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L740-L766] in case of loginUser creation so far before proxy user exits (actually this is UGI initialization) * Later on proxy user created w/ no tokens * Finally authentication fails on driver side because no credentials I've taken a look at the design doc found in [https://github.com/mesosphere/spark/pull/26] and it states the following: !screenshot-1.png! The bullet point 7 was maybe true for mesos in 2018 but it's not working w/ K8S now for sure. In the current Spark codebase only executors are using runAsSparkUser but driver is not (so runs as proxy user w/o tokens). So my general opinion considering the facts what we have which may change. Adding --proxy-user param for K8S was a good idea but: * either not tested on cluster at all or tested on a different execution path * tested and was working on cluster but after the merge (Mar 17, 2020) something has really changed in other parts of the code * all in all what I see is that the feature now completely broken [~pedro.rossi] any comments because according to the latest facts this is a feature blocker? > Support --proxy-user for Spark on K8s > ------------------------------------- > > Key: SPARK-25355 > URL: https://issues.apache.org/jira/browse/SPARK-25355 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Spark Core > Affects Versions: 3.1.0 > Reporter: Stavros Kontopoulos > Assignee: Pedro Rossi > Priority: Major > Fix For: 3.1.0 > > Attachments: client.log, driver.log, screenshot-1.png, > with_proxy_extradebugLogs.log > > > SPARK-23257 adds kerberized hdfs support for Spark on K8s. A major addition > needed is the support for proxy user. A proxy user is impersonated by a > superuser who executes operations on behalf of the proxy user. More on this: > [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html] > [https://github.com/spark-notebook/spark-notebook/blob/master/docs/proxyuser_impersonation.md] > This has been implemented for Yarn upstream and Spark on Mesos here: > [https://github.com/mesosphere/spark/pull/26] > [~ifilonenko] creating this issue according to our discussion. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org