[ 
https://issues.apache.org/jira/browse/SPARK-25355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530709#comment-17530709
 ] 

Shrikant commented on SPARK-25355:
----------------------------------

Thanks for looking further. Your assumption that 3 tokens loaded from 
HADOOP_TOKEN_FILE_LOCATION are not compatible to do the authentication is 
wrong. 

The reason authentication failed is because authentication is being done using 
tokens of the proxy user, and since proxy user doesn't have any tokens, auth 
fails. The 3 tokens that were loaded were added to the loginUser, not the proxy 
user. That's the reason I have been trying to highlight that this auth works 
when we don't use proxy-user param. Only when proxy-user param is passed, 
authentication fails. 

If you have look in the code, In SparkSubmit.submit() method:
{code:java}
private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {

  def doRunMain(): Unit = {
    if (args.proxyUser != null) {
      val proxyUser = UserGroupInformation.createProxyUser(args.proxyUser,
        UserGroupInformation.getCurrentUser())
      try {
        proxyUser.doAs(new PrivilegedExceptionAction[Unit]() {
          override def run(): Unit = {
            runMain(args, uninitLog)
          }
        })
      } catch {  {code}
UserGroupInformation.getCurrentUser() will call  getLoginUser() which in turn 
will call createLoginUser(). Here the login user is created and tokens are read 
from HADOOP_TOKEN_FILE_LOCATION and then added to this login user.

After this UserGroupInformation.createProxyUser() will create a new proxy user 
using the above loginUser but it doesn't add the tokens only copies principals.

proxyUser.doAs() --> this will do the authentication using this proxy user, not 
the loginUser.

Hope, I have been able to explain the issue.

> Support --proxy-user for Spark on K8s
> -------------------------------------
>
>                 Key: SPARK-25355
>                 URL: https://issues.apache.org/jira/browse/SPARK-25355
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Kubernetes, Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Stavros Kontopoulos
>            Assignee: Pedro Rossi
>            Priority: Major
>             Fix For: 3.1.0
>
>
> SPARK-23257 adds kerberized hdfs support for Spark on K8s. A major addition 
> needed is the support for proxy user. A proxy user is impersonated by a 
> superuser who executes operations on behalf of the proxy user. More on this: 
> [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]
> [https://github.com/spark-notebook/spark-notebook/blob/master/docs/proxyuser_impersonation.md]
> This has been implemented for Yarn upstream and Spark on Mesos here:
> [https://github.com/mesosphere/spark/pull/26]
> [~ifilonenko] creating this issue according to our discussion.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to