[ 
https://issues.apache.org/jira/browse/SPARK-21122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052362#comment-16052362
 ] 

Craig Ingram commented on SPARK-21122:
--------------------------------------

Thanks for the feedback, Sean. This ticket simply represents one of the issues 
brought up in [SPARK-21084|https://issues.apache.org/jira/browse/SPARK-21084]. 
Since this topic covers a lot of ground, we wanted to create a separate ticket 
for the discussion and work. We are definitely open to alternate approaches as 
well.

I agree that this is something the resource managers should manage, but I don't 
see a way control deallocation at a reasonable level from within a resource 
manager. Even without these changes, you still have to configure things in two 
places. I also don't believe you would want to enable something like YARN's 
preemption while using this feature as the two mechanisms would end up fighting 
each other.

At the very least, the existing policy needs some additional logic to 
preemptively decommission executors when other applications are being starved. 
Couple this with 
[SPARK-21097|https://issues.apache.org/jira/browse/SPARK-21097] and even 
executors with cached data can safely be removed without a huge performance 
impact. I think the improvement here is maximizing cluster utilization while 
also allowing other applications to start and fairly consume resources.

> Address starvation issues when dynamic allocation is enabled
> ------------------------------------------------------------
>
>                 Key: SPARK-21122
>                 URL: https://issues.apache.org/jira/browse/SPARK-21122
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, YARN
>    Affects Versions: 2.2.0, 2.3.0
>            Reporter: Craig Ingram
>
> When dynamic resource allocation is enabled on a cluster, it’s currently 
> possible for one application to consume all the cluster’s resources, 
> effectively starving any other application trying to start. This is 
> particularly painful in a notebook environment where notebooks may be idle 
> for tens of minutes while the user is figuring out what to do next (or eating 
> their lunch). Ideally the application should give resources back to the 
> cluster when monitoring indicates other applications are pending.
> Before delving into the specifics of the solution. There are some workarounds 
> to this problem that are worth mentioning:
> * Set spark.dynamicAllocation.maxExecutors to a small value, so that users 
> are unlikely to use the entire cluster even when many of them are doing work. 
> This approach will hurt cluster utilization.
> * If using YARN, enable preemption and have each application (or 
> organization) run in a separate queue. The downside of this is that when YARN 
> preempts, it doesn't know anything about which executor it's killing. It 
> would just as likely kill a long running executor with cached data as one 
> that just spun up. Moreover, given a feature like 
> https://issues.apache.org/jira/browse/SPARK-21097 (preserving cached data on 
> executor decommission), YARN may not wait long enough between trying to 
> gracefully and forcefully shut down the executor. This would mean the blocks 
> that belonged to that executor would be lost and have to be recomputed.
> * Configure YARN to use the capacity scheduler with multiple scheduler 
> queues. Put high-priority notebook users into a high-priority queue. Prevents 
> high-priority users from being starved out by low-priority notebook users. 
> Does not prevent users in the same priority class from starving each other.
> Obviously any solution to this problem that depends on YARN would leave other 
> resource managers out in the cold. The solution proposed in this ticket will 
> afford spark clusters the flexibly to hook in different resource allocation 
> policies to fulfill their user's needs regardless of resource manager choice. 
> Initially the focus will be on users in a notebook environment. When 
> operating in a notebook environment with many users, the goal is fair 
> resource allocation. Given that all users will be using the same memory 
> configuration, this solution will focus primarily on fair sharing of cores.
> The fair resource allocation policy should pick executors to remove based on 
> three factors initially: idleness, presence of cached data, and uptime. The 
> policy will favor removing executors that are idle, short-lived, and have no 
> cached data. The policy will only preemptively remove executors if there are 
> pending applications or cores (otherwise the default dynamic allocation 
> timeout/removal process is followed). The policy will also allow an 
> application's resource consumption to expand based on cluster utilization. 
> For example if there are 3 applications running but 2 of them are idle, the 
> policy will allow a busy application with pending tasks to consume more than 
> 1/3rd of the the cluster's resources.
> More complexity could be added to take advantage of task/stage metrics, 
> histograms, and heuristics (i.e. favor removing executors running tasks that 
> are quick). The important thing here is to benchmark effectively before 
> adding complexity so we can measure the impact of the changes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to