[ https://issues.apache.org/jira/browse/SPARK-25355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530709#comment-17530709 ]
Shrikant commented on SPARK-25355: ---------------------------------- Thanks for looking further. Your assumption that 3 tokens loaded from HADOOP_TOKEN_FILE_LOCATION are not compatible to do the authentication is wrong. The reason authentication failed is because authentication is being done using tokens of the proxy user, and since proxy user doesn't have any tokens, auth fails. The 3 tokens that were loaded were added to the loginUser, not the proxy user. That's the reason I have been trying to highlight that this auth works when we don't use proxy-user param. Only when proxy-user param is passed, authentication fails. If you have look in the code, In SparkSubmit.submit() method: {code:java} private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = { def doRunMain(): Unit = { if (args.proxyUser != null) { val proxyUser = UserGroupInformation.createProxyUser(args.proxyUser, UserGroupInformation.getCurrentUser()) try { proxyUser.doAs(new PrivilegedExceptionAction[Unit]() { override def run(): Unit = { runMain(args, uninitLog) } }) } catch { {code} UserGroupInformation.getCurrentUser() will call getLoginUser() which in turn will call createLoginUser(). Here the login user is created and tokens are read from HADOOP_TOKEN_FILE_LOCATION and then added to this login user. After this UserGroupInformation.createProxyUser() will create a new proxy user using the above loginUser but it doesn't add the tokens only copies principals. proxyUser.doAs() --> this will do the authentication using this proxy user, not the loginUser. Hope, I have been able to explain the issue. > Support --proxy-user for Spark on K8s > ------------------------------------- > > Key: SPARK-25355 > URL: https://issues.apache.org/jira/browse/SPARK-25355 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Spark Core > Affects Versions: 3.1.0 > Reporter: Stavros Kontopoulos > Assignee: Pedro Rossi > Priority: Major > Fix For: 3.1.0 > > > SPARK-23257 adds kerberized hdfs support for Spark on K8s. A major addition > needed is the support for proxy user. A proxy user is impersonated by a > superuser who executes operations on behalf of the proxy user. More on this: > [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html] > [https://github.com/spark-notebook/spark-notebook/blob/master/docs/proxyuser_impersonation.md] > This has been implemented for Yarn upstream and Spark on Mesos here: > [https://github.com/mesosphere/spark/pull/26] > [~ifilonenko] creating this issue according to our discussion. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org