[ 
https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845219#comment-16845219
 ] 

Thomas Graves commented on SPARK-27495:
---------------------------------------

I'm working though the design of this and there are definitely a lot of things 
to think about here.  I would like to get other peoples input before going much 
further.  

I think a few main points we need to decide on:

1) What resources can user specify per stage - the more I think about this, the 
more things I can think of people wanting to change.  For instance, normally in 
Spark you don't specify the task requirements, you specify the executor 
requirements and then possibly the cores per task.  So if someone is requesting 
resources per stage, I think we need a way to specify both task and executor 
requirements.  You could specify executor requirements based on task 
requirements like say I want 4 tasks per executor and then multiply the task 
requirements, but then you have things like overhead memory and users aren't 
used to specifying at task level, so I think its better to separate those out.  
Then going beyond that, we know people want to limit the # of total running 
tasks per stage, looking at memory, there is offheap, heap, overhead memory. I 
can envision people wanting to change retries or shuffle parameters per stage.  
Its definitely more stuff then just a few resources.

Basically its coming to a lot of things in SparkConf.    Now whether that is 
the interface we show to users or not is another question.  You could for 
instance let them pass an entire SparkConf in and set the configs they are used 
to setting and deal with that.  But then you have to error or ignore the 
configs we don't support dynamically changing and you have to deal with 
resolving conflicts on an unknown set of things if they have specified 
different confs for multiple operations that get combined into a single stage 
(ie like map.filter.groupby and they specified conflicting resources for map 
and groupby).  Or you could make an interface that only gives them specific 
options and keep adding to that as people request more things.  The latter I 
think is cleaner in some ways but is also less flexible and requires a new API 
vs possibly using the configs users are already used to.

2) API.  I think ideally each of the operators (RDD.*, Dataset.*, where *  is 
map, filter, groupby, sort, join, etc) would have an optional parameter to 
specify the resources you want to use.  I think this would make it clear to the 
user that for at least that operation these will be applied.  It also helps 
with cases you don't have an RDD yet, like on the initial read of files. This 
however could mean a lot of API changes. 

Another way, which was originally proposed in SPARK-24615,  is adding something 
like a .withResource api but then you have to deal with do you prefix it, post 
fix it, etc.  If you postfix it what about things like eager execution mode.  
prefix seems to make more sense. But then you still don't have an option for 
the readers.   I think this also makes the scoping less clear, although you 
still have some of that with adding it to the individual operators.

3) Scoping.  The scoping could be confusing to the users.  Ideally I want to do 
RDD/Dataset/Data frame api's (I realize the Jira was initially more in the 
scope of the barrier scheduling, but if we are going to do it, it seems like we 
should make it generic). The RDD is a bit more obvious where the stage 
boundaries might be, but with catalyst it can do any sort of optimizations that 
could lead to stage boundaries the user doesn't expect.    In either case you 
also have cases where things are in 2 stages, like groupby where it does the 
partial aggregation, the shuffle, then the full aggregation.  The withResources 
would have to apply to both stages.   Then you have things like do you keep 
using that resource profile until they change it or is it just those stages and 
then it goes back to the default application level configs.  You could also go 
back to what I mentioned on the Jira where the withResources would be like a 
function scope {}...  withResources() \{   everything that should be done 
within that resource profile.... } but that syntax isn't like anything we have 
now that I'm aware of.

3) How to deal with multiple, possibly conflict resource requirements.  You can 
go with the max in some cases but for some cases that might not make sense. For 
instance if you are doing memory you might actually want them to be the sum, 
for instance you know this operations need x memory, then catalyst combines 
that with another operation that needs y memory.  You would want to sum those 
or you would have to have the user again realize those will get combined and 
have them do it.  The latter isn't ideal either.

Really, the dataframe/dataset api shouldn't need to have any api to specify 
resources, ideally catalyst figures it out.  For instance if it has a gpu and 
catalyst knows how to use it, it would just use it. Ideally it was smarter 
about how much memory it needs, etc as well, but that functionality isn't there 
yet and I think the way people will want to use this is many times with the 
dataset/data frame api if its going to support cpu/memory/etc.  

Let me know if anyone has input on the above issues?

> Support Stage level resource configuration and scheduling
> ---------------------------------------------------------
>
>                 Key: SPARK-27495
>                 URL: https://issues.apache.org/jira/browse/SPARK-27495
>             Project: Spark
>          Issue Type: Story
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Thomas Graves
>            Priority: Major
>
> Currently Spark supports CPU level scheduling and we are adding in 
> accelerator aware scheduling with 
> https://issues.apache.org/jira/browse/SPARK-24615, but both of those are 
> scheduling via application level configurations.  Meaning there is one 
> configuration that is set for the entire lifetime of the application and the 
> user can't change it between Spark jobs/stages within that application.  
> Many times users have different requirements for different stages of their 
> application so they want to be able to configure at the stage level what 
> resources are required for that stage.
> For example, I might start a spark application which first does some ETL work 
> that needs lots of cores to run many tasks in parallel, then once that is 
> done I want to run some ML job and at that point I want GPU's, less CPU's, 
> and more memory.
> With this Jira we want to add the ability for users to specify the resources 
> for different stages.
> Note that https://issues.apache.org/jira/browse/SPARK-24615 had some 
> discussions on this but this part of it was removed from that.
> We should come up with a proposal on how to do this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to