[jira] [Commented] (SPARK-25355) Support --proxy-user for Spark on K8s

Gabor Somogyi (Jira) Mon, 02 May 2022 09:29:06 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-25355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530812#comment-17530812
 ]


Gabor Somogyi commented on SPARK-25355:
---------------------------------------

> Thanks for looking further. Your assumption that 3 tokens loaded from 
> HADOOP_TOKEN_FILE_LOCATION are not compatible to do the authentication is 
> wrong.

Please be aware that I'm one of the authors of this delegation token framework 
and I'm not guessing but knowing exactly what's going on. The only question is 
what you guys are planning and doing :)

Since you've not yet provided full logs, what is the master plan, how the 
authentication is planned I'm asking simple questions. If not answered then I'm 
not able to help you forward.
 * I've asked full driver and executor logs but we've received a hadoop 
specific snippet. Can we get a full log as asked? If too large then stored 
externally or something.
 * In spark-submit you provide cluster mode deployment
{code:java}
...
--deploy-mode cluster \
...
{code}
but in the log I see client mode:
{code:java}
...
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=10.4.201.155 --deploy-mode client --proxy-user 
shrprasa --properties-file /opt/spark/conf/spark.properties --class 
org.apache.spark.examples.SparkPi spark-internal
...
{code}
So which one is the source of truth because it has major influence how security 
is working? Hunting multiple issues is not fun (same issue like 
ConnectionRefused in the dev mailing list). So the ask here is to provide full 
logs and submit command which belongs together.

 * What is the master plan to provide a TGT for the current user on the driver 
POD? I'm asking it because this is the only way to ask Spark to obtain a 
delegation token for the proxy user. But since the logs are partial I'm also 
not able to tell what happened there.
 * What is the main intention to use HADOOP_TOKEN_FILE_LOCATION? That is mainly 
used to load tokens for the current user and not for the proxy user. Taking 
over any token to the proxy user is never going to happen because that would 
mean a security breach.
 * And finally which token do you expect to do authentication against HDFS? 
(Spark obtained one or loaded by HADOOP_TOKEN_FILE_LOCATION)

[~pedro.rossi] how it is tested on cluster because the description of the PR 
doesn't tell anything about that?

> Support --proxy-user for Spark on K8s
> -------------------------------------
>
>                 Key: SPARK-25355
>                 URL: https://issues.apache.org/jira/browse/SPARK-25355
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Kubernetes, Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Stavros Kontopoulos
>            Assignee: Pedro Rossi
>            Priority: Major
>             Fix For: 3.1.0
>
>
> SPARK-23257 adds kerberized hdfs support for Spark on K8s. A major addition 
> needed is the support for proxy user. A proxy user is impersonated by a 
> superuser who executes operations on behalf of the proxy user. More on this: 
> [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]
> [https://github.com/spark-notebook/spark-notebook/blob/master/docs/proxyuser_impersonation.md]
> This has been implemented for Yarn upstream and Spark on Mesos here:
> [https://github.com/mesosphere/spark/pull/26]
> [~ifilonenko] creating this issue according to our discussion.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25355) Support --proxy-user for Spark on K8s

Reply via email to