[ 
https://issues.apache.org/jira/browse/SPARK-27364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812488#comment-16812488
 ] 

Thomas Graves edited comment on SPARK-27364 at 4/8/19 3:01 PM:
---------------------------------------------------------------

There are 3 main user facing impacts for the user for this are the taskContext 
interface to fetch the resources, the user api to specify the gpu count, and 
then how the executor discovers the gpu's or is told the gpus. Below is more 
detail:

 

1) How the user gets the resources from the TaskContext and BarrierTaskContext

  For the taskContext interface I propose we add an api like:

*{color:#000080}def {color}getResources(): 
{color:#20999d}Map{color}[{color:#20999d}String{color}, ResourceInformation]*

Where the Map key is the resource type.  So examples would be "gpu", "fpga", 
etc.  "gpu" would be the only one we officially support to start with.

ResourceInformation would be a class with a name, units, count, and addresses.  
The name would be "gpu", the units for gpu would be empty "", but for other 
resources types like memory it could be GiB or similar, the count is the number 
of them, so for gpu's it would be the number allocated, and finally the address 
Array of strings could be whatever we want, in the gpu case it would just be 
the indexes of the gpu's allocated to the task, ie ["0", "2", "3"]. I made this 
a string so its very flexible as to what the address is based on different 
resources types.  Now the user has to know how to inpret this, but depending on 
what you are doing with them even the same tools have multiple ways to specify. 
For instance with tensorflow{{ you can specify in CUDA_VISIBLE_DEVICES=2,3 or 
you can speicify like:
 for d in ['/device:GPU:2', '/device:GPU:3']:
 }}

*{color:#000080}private val {color}name: {color:#20999d}String{color},*
 *{color:#000080}private val {color}units: {color:#20999d}String{color},*
 *{color:#000080}private val {color}count: Long,*
 *{color:#000080}private val {color}addresses: 
Array[{color:#20999d}String{color}] = Array.empty*

*{color:#000080}def {color}getName(): {color:#20999d}String {color}= name*
 *{color:#000080}def {color}getUnits(): {color:#20999d}String {color}= units*
 *{color:#000080}def {color}getCount(): Long = count*
 *{color:#000080}def {color}getAddresses(): Array[{color:#20999d}String{color}] 
= addresses*

2) How the user specifies the gpu resources upon application submission

Here we need multiple configs:

   a) one for the user to specify the gpus per task, that config, to make it 
extensible for other resources, I propose: *spark.task.resource.\{resource 
type}.count* .  This implementation would only support gpu but it gives us 
flexibility to add more. This allows for multiple resources as well as multiple 
configs for that resource. For instance resource type here would be gpu, but 
you could add fpga.  It also would allow you to add more configs instead of 
count.  You could add in like type for I want a certain gpu type for instance.

   b) User has to specify how many gpu's per executor and driver.  This one is 
a bit more complicated since it has to work with the resource managers to 
actually acquire those but I think it makes sense to have common configs like 
we do for cores and memory. So we can have *spark.executor.resource.\{resource 
type}.count* and *spark.driver.resource.\{resource type}.count*.   This 
implementation would only support gpu.  The tricky thing here is some of the 
resource managers already have configs for asking for gpu's.  Yarn has 
{{spark.yarn.executor.resource.{resource-type}}} although it was added in 3.0 
and hasn't shipped yet, but we can't just remove it since you could ask yarn 
for other resource types spark doesn't know about.  Kubernetes you have to 
request via the pod template so I think it would be on the user to make sure 
those match. mesos has {{spark.mesos.gpus.max}}.  So we just need to make sure 
the new configs maps into those and having the duplicate configs might make it 
a bit weird to the user.

3) how the executor discovers or is told the gpu resources it has.

Here I think we have 2 options for the user/resource manager.  

  a) I propose we add a config *spark.\{executor, 
driver}.resource.gpu.discoverScript* to allow the user to specify a discovery 
script. This script gets run when the executor starts and the user requested 
gpus to discover what Gpu's the executor has.   A simple example of this would 
be the script simply runs "nvidia-smi --query-gpu=index --format=csv,noheader'" 
to get the gpu indexes for nvidia cards.  You could make this script super 
simple or complicated depending on your setup.

  b) Also add an option to the executor launch *--gpuDevices* that allows the 
resource manager to specify the indexes of the gpu devices it has.   This 
allows insecure or non-containerized resource managers like standalone mode to 
allocate gpu's per executor without having containers and isolation all 
implemented.  We could try to make this more generic but seems like it could 
get complicated and the resource managers would have to be updated to support 
anyway, so am proposing its own gpu config for now.


was (Author: tgraves):
There are 3 main user facing impacts for the user for this are the taskContext 
interface to fetch the resources, the user api to specify the gpu count, and 
then how the executor discovers the gpu's or is told the gpus. Below is more 
detail:

 

1) How the user gets the resources from the TaskContext and BarrierTaskContext

  For the taskContext interface I propose we add an api like:

{color:#000080}def {color}getResources(): 
{color:#20999d}Map{color}[{color:#20999d}String{color}, ResourceInformation]

Where the Map key is the resource type.  So examples would be "gpu", "fpga", 
etc.  "gpu" would be the only one we officially support to start with.

ResourceInformation would be a class with a name, units, count, and addresses.  
The name would be "gpu", the units for gpu would be empty "", but for other 
resources types like memory it could be GiB or similar, the count is the number 
of them, so for gpu's it would be the number allocated, and finally the address 
Array of strings could be whatever we want, in the gpu case it would just be 
the indexes of the gpu's allocated to the task, ie ["0", "2", "3"]. I made this 
a string so its very flexible as to what the address is based on different 
resources types.  Now the user has to know how to inpret this, but depending on 
what you are doing with them even the same tools have multiple ways to specify. 
For instance with tensorflow{{ you can specify in CUDA_VISIBLE_DEVICES=2,3 or 
you can speicify like:
for d in ['/device:GPU:2', '/device:GPU:3']:
}}

{color:#000080}private val {color}name: {color:#20999d}String{color},
{color:#000080}private val {color}units: {color:#20999d}String{color},
{color:#000080}private val {color}count: Long,
{color:#000080}private val {color}addresses: 
Array[{color:#20999d}String{color}] = Array.empty

{color:#000080}def {color}getName(): {color:#20999d}String {color}= name
 {color:#000080}def {color}getUnits(): {color:#20999d}String {color}= units
 {color:#000080}def {color}getCount(): Long = count
 {color:#000080}def {color}getAddresses(): Array[{color:#20999d}String{color}] 
= addresses



2) How the user specifies the gpu resources upon application submission

Here we need multiple configs:

   a) one for the user to specify the gpus per task, that config, to make it 
extensible for other resources, I propose: *spark.task.resource.\{resource 
type}.count* .  This implementation would only support gpu but it gives us 
flexibility to add more. This allows for multiple resources as well as multiple 
configs for that resource. For instance resource type here would be gpu, but 
you could add fpga.  It also would allow you to add more configs instead of 
count.  You could add in like type for I want a certain gpu type for instance.

   b) User has to specify how many gpu's per executor and driver.  This one is 
a bit more complicated since it has to work with the resource managers to 
actually acquire those but I think it makes sense to have common configs like 
we do for cores and memory. So we can have *spark.executor.resource.\{resource 
type}.count* and *spark.driver.resource.\{resource type}.count*.   This 
implementation would only support gpu.  The tricky thing here is some of the 
resource managers already have configs for asking for gpu's.  Yarn has 
{{spark.yarn.executor.resource.\{resource-type}}} although it was added in 3.0 
and hasn't shipped yet, but we can't just remove it since you could ask yarn 
for other resource types spark doesn't know about.  Kubernetes you have to 
request via the pod template so I think it would be on the user to make sure 
those match. mesos has {{spark.mesos.gpus.max}}.  So we just need to make sure 
the new configs maps into those and having the duplicate configs might make it 
a bit weird to the user.

3) how the executor discovers or is told the gpu resources it has.

Here I think we have 2 options for the user/resource manager.  

  a) I propose we add a config *spark.\{executor, 
driver}.resource.gpu.discoverScript* to allow the user to specify a discovery 
script. This script gets run when the executor starts and the user requested 
gpus to discover what Gpu's the executor has.   A simple example of this would 
be the script simply runs "nvidia-smi --query-gpu=index --format=csv,noheader'" 
to get the gpu indexes for nvidia cards.  You could make this script super 
simple or complicated depending on your setup.

  b) Also add an option to the executor launch *--gpuDevices* that allows the 
resource manager to specify the indexes of the gpu devices it has.   This 
allows insecure or non-containerized resource managers like standalone mode to 
allocate gpu's per executor without having containers and isolation all 
implemented.  We could try to make this more generic but seems like it could 
get complicated and the resource managers would have to be updated to support 
anyway, so am proposing its own gpu config for now.

> User-facing APIs for GPU-aware scheduling
> -----------------------------------------
>
>                 Key: SPARK-27364
>                 URL: https://issues.apache.org/jira/browse/SPARK-27364
>             Project: Spark
>          Issue Type: Story
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Xiangrui Meng
>            Assignee: Thomas Graves
>            Priority: Major
>
> Design and implement:
> * General guidelines for cluster managers to understand resource requests at 
> application start. The concrete conf/param will be under the design of each 
> cluster manager.
> * APIs to fetch assigned resources from task context.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to