[ 
https://issues.apache.org/jira/browse/SPARK-27376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841402#comment-16841402
 ] 

Thomas Graves commented on SPARK-27376:
---------------------------------------

The design is pretty straight forward, there is really only 1 question which is 
consistency between the yarn resource configs and now the new spark resource 
configs, see the last paragraph for more details.

Require Hadoop 3.1 and > to get official GPU support.  Hadoop can be configured 
to use docker with isolation so that the containers yarn hands you back has the 
requested gpu's and other resources.  YARN does not give you information about 
what it allocated for gpu's, you have to discover it.  YARN has hardcoded 
resource types for fpga and gpu, anything else is user defined types. Spark 3.0 
already added support for requesting any resource from YARN via the configs: 
spark.yarn.\{executor/driver/am}.resource, so the changes required for this 
Jira are simply to map the new spark configs: 
spark.\{executor/driver}.resource.\{fpga/gpu}.count into the corresponding yarn 
configs. For other resource types we can't map them though because we don't 
know what they are called on the yarn side.  So for any other resource they 
will have to specify both configs spark.yarn.\{executor/driver/am}.resource and 
spark.\{executor/driver}.resource.\{fpga/gpu}.  That isn't ideal but the only 
other option would be to have some sort of mapping the user would pass in.  We 
can always add more yarn resource types if it adds them. The main 2 people are 
interested in seem to be gpu and fpga anyway, so I think for now this is fine.

For versions < hadoop 3.1 it won't allocate based on GPU, so if they are using 
hadoop 2.7, 2.8, etc they could still allocate nodes with GPU, with yarn node 
labels or other hacks, and tell Spark the count and to auto discover them and 
Spark will pick up whatever it sees in the container - or really whatever the 
discoveryScript returns, so people could potentially write that script to match 
whatever hacks they have for sharing gpu nodes now.

The  flow from user point would be:

For GPU and FPGA: User will specify the 
spark.\{executor/driver}.resource.\{gpu/fpga}.count and the 
spark.\{executor/driver}.resource.\{gpu/fpga}.discoveryScript. The spark yarn 
code maps these into the corresponding yarn resource config and asks yarn for 
the containers.  Yarn allocates the containers and Spark will run the discovery 
script to figure out what it has for allocations.

For other resource types the user will have to specify:  
spark.yarn.\{executor/driver/am}.resource and 
spark.\{executor/driver}.resource.\{gpu/fpga}.count and the 
spark.\{executor/driver}.resource.\{gpu/fpga}.discoveryScript.  

The only other thing that is a inconsistent is the 
spark.yarn.\{executor/driver/am}.resource configs don't  have a .count on the 
end. Right now that config takes a string as a value and splits that into an 
actual count and a unit. The yarn resource configs were just added in 3.0 so 
haven't been released so we could potentially change them.  We could change the 
spark user facing configs ( 
spark.\{executor/driver}.resource.\{gpu/fpga}.count) to be similar to make it 
easier for the user to specify both a count and unit in 1 config instead of 2, 
but I like the ability to separate them on the discovery side as well. We took  
the .unit support out in the executor pull request so it isn't there right now 
anyway.  We could do the opposite and change the yarn ones to have a .count and 
.unit as well just to make things consistent but that makes user have to 
specify 2 instead of 1.  Or the third option would be to have the .count and 
.unit and then eventually have a third one that lets the user specify them 
together if we add resources that actually use it.

My thoughts are  for the user facing configs we change .count to be .amount and 
let the user specify units on it. This makes it easier for the user and it 
allows us to extend later if we want. I think we should also change the 
spark.yarn configs to have a .amount because yarn has already added other 
things like tags and attributes so we if want to extend the spark support for 
those it makes more sense to have those as another postfix option 
spark.yarn...resource.tags=

We can leave everything else that is internal as separate count and units and 
since gpu/fpga don't need units we don't need to actually add it to our 
ResourceInformation since we already removed it. 

 

> Design: YARN supports Spark GPU-aware scheduling
> ------------------------------------------------
>
>                 Key: SPARK-27376
>                 URL: https://issues.apache.org/jira/browse/SPARK-27376
>             Project: Spark
>          Issue Type: Sub-task
>          Components: YARN
>    Affects Versions: 3.0.0
>            Reporter: Xiangrui Meng
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to