[ 
https://issues.apache.org/jira/browse/SPARK-27364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815465#comment-16815465
 ] 

Thomas Graves commented on SPARK-27364:
---------------------------------------

So there is actually another one we need for standalone mode with the Driver.  
Since there might not be isolation the driver needs to be able to be told what 
gpu's to use.  This means that we will should have an API for the user to get 
the GPU's on the driver just like they do with TaskContext.  We also need a way 
on launching the driver to specify the gpu's. 

 

4)  Driver discovers or is told which gpus:

  a) Similar to the executor we have a config for it to specify a script:  
*spark.driver.resource.gpu.discoveryScript*     to allow it to discover the 
gpus. 

  b) We also need a parameter on startup for Standalone mode to specify them, 
instead of having a parameter like the executors, since there is only one 
driver and it can be launched in many different ways between the different 
resource manager and cluster/client modes, it would be easier to have a config 
to specify the gpu indices it should use.  The reason not to use a config on 
the executors is you could have executors on different hosts each of which 
would have different indices and so having one common config there doesn't make 
sense. So I propose a config to do this: *spark.driver.resource.gpu.addresses* 

5) For the user facing api for the user on the driver to see the gpu resources, 
we will add a function to *SparkContext* similar to the TaskContext version:

*{color:#000080}def {color}getResources(): 
{color:#20999d}Map{color}[{color:#20999d}String{color}, ResourceInformation]*

> User-facing APIs for GPU-aware scheduling
> -----------------------------------------
>
>                 Key: SPARK-27364
>                 URL: https://issues.apache.org/jira/browse/SPARK-27364
>             Project: Spark
>          Issue Type: Story
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Xiangrui Meng
>            Assignee: Thomas Graves
>            Priority: Major
>
> Design and implement:
> * General guidelines for cluster managers to understand resource requests at 
> application start. The concrete conf/param will be under the design of each 
> cluster manager.
> * APIs to fetch assigned resources from task context.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to