[jira] [Commented] (SPARK-30519) Executor can't use spark.executorEnv.HADOOP_USER_NAME to change the user accessing to hdfs

Laurenceau Julien (Jira) Wed, 29 Jul 2020 02:47:18 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-30519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167075#comment-17167075
 ]


Laurenceau Julien commented on SPARK-30519:
-------------------------------------------

As I understand a possible work around would be to directly run spark as a non 
root inside the container ie, run spark as  user HADOOP_USER_NAME inside the 
container.

However, I may be missing something but this is not possible using spark only, 
but it could be done from kubernetes.

Spark Manual says:

 
{panel}
Images built from the project provided Dockerfiles contain a default 
[{{USER}}|https://docs.docker.com/engine/reference/builder/#user] directive 
with a default UID of {{185}}. This means that the resulting images will be 
running the Spark processes as this UID inside the container. Security 
conscious deployments should consider providing custom images with {{USER}} 
directives specifying their desired unprivileged UID and GID. The resulting UID 
should include the root group in its supplementary groups in order to be able 
to run the Spark executables. Users building their own images with the provided 
{{docker-image-tool.sh}} script can use the {{-u <uid>}} option to specify the 
desired UID.

Alternatively the [Pod 
Template|http://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template]
 feature can be used to add a [Security 
Context|https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#volumes-and-file-systems]
 with a {{runAsUser}} to the pods that Spark submits. This can be used to 
override the {{USER}} directives in the images themselves. Please bear in mind 
that this requires cooperation from your users and as such may not be a 
suitable solution for shared environments. Cluster administrators should use 
[Pod Security 
Policies|https://kubernetes.io/docs/concepts/policy/pod-security-policy/#users-and-groups]
 if they wish to limit the users that pods may run as.
{panel}
 

 

This seems to be the approach used by Google on their spark-on-k8s-operator 
since they propose the feature: 
{panel}
Automatically runs {{spark-submit}} on behalf of users for each 
{{SparkApplication}} eligible for submission.{panel}
Could someone confirm it as a possible workaround ?

> Executor can't use spark.executorEnv.HADOOP_USER_NAME to change the user 
> accessing to hdfs
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-30519
>                 URL: https://issues.apache.org/jira/browse/SPARK-30519
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.3
>            Reporter: Xiaoming
>            Priority: Minor
>
> Currently, we can specify hadoop user by setting HADOOP_USER_NAME on driver 
> when submit a job. However it's invalid to executor by setting 
> spark.executorEnv.HADOOP_USER_NAME.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30519) Executor can't use spark.executorEnv.HADOOP_USER_NAME to change the user accessing to hdfs

Reply via email to