[ 
https://issues.apache.org/jira/browse/SPARK-32429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165903#comment-17165903
 ] 

Xiangrui Meng commented on SPARK-32429:
---------------------------------------

Couple questions:

1. Which GPU resource name do we use? "spark.task.resource.gpu" does not have 
special meaning in the current implemetnation.
2. I think we can do this for PySpark workers if 1) gets resolved. However, for 
executors running inside the same JVM, is there a way to set 
CUDA_VISIBLE_DEVICES differently per executor thread?

> Standalone Mode allow setting CUDA_VISIBLE_DEVICES on executor launch
> ---------------------------------------------------------------------
>
>                 Key: SPARK-32429
>                 URL: https://issues.apache.org/jira/browse/SPARK-32429
>             Project: Spark
>          Issue Type: Improvement
>          Components: Deploy
>    Affects Versions: 3.0.0
>            Reporter: Thomas Graves
>            Priority: Major
>
> It would be nice if standalone mode could allow users to set 
> CUDA_VISIBLE_DEVICES before launching an executor.  This has multiple 
> benefits. 
>  * kind of an isolation in that the executor can only see the GPUs set there. 
>  * If your GPU application doesn't support explicitly setting the GPU device 
> id, setting this will make any GPU look like the default (id 0) and things 
> generally just work without any explicit setting
>  * New features are being added on newer GPUs that require explicit setting 
> of CUDA_VISIBLE_DEVICES like MIG 
> ([https://www.nvidia.com/en-us/technologies/multi-instance-gpu/])
> The code changes to just set this are very small, once we set them we would 
> also possibly need to change the gpu addresses as it changes them to start 
> from device id 0 again.
> The easiest implementation would just specifically support this and have it 
> behind a config and set when the config is on and GPU resources are 
> allocated. 
> Note we probably want to have this same thing set when we launch a python 
> process as well so that it gets same env.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to