[ 
https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292048#comment-16292048
 ] 

Xuefu Zhang commented on SPARK-22765:
-------------------------------------

[~tgraves], I think it would help if SPARK-21656 can make a close-to-zero idle 
time work. This is one source of inefficiency. Our version is too old to 
backport the fix, but will try out this when we upgrade.

The second source of inefficiency comes in the fact that Spark favors bigger 
containers. A 4-core container might be running one task while wasting the 
other cores/mem. The executor cannot die as long as there is one task running. 
One might argue that a user configures 1-core containers under dynamic 
allocation. but this is probably not optimal on other aspects.

The third reason that one might favor MR-styled scheduling is its simplicity 
and efficiency. Frequently we found that for heavy workload the scheduler 
cannot really keep up with the task ups and downs, especially when the tasks 
finish fast. 

For cost-conscious users, cluster-level resource efficiency is probably what's 
looked at. My suspicion is that an enhanced MR-styled scheduling, simple and 
performing, will be significantly improve resource efficiency than a typical 
use of dynamic allocation, without sacrificing much performance.

As a start point, we will first benchmark with SPARK-21656 when possible.

> Create a new executor allocation scheme based on that of MR
> -----------------------------------------------------------
>
>                 Key: SPARK-22765
>                 URL: https://issues.apache.org/jira/browse/SPARK-22765
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler
>    Affects Versions: 1.6.0
>            Reporter: Xuefu Zhang
>
> Many users migrating their workload from MR to Spark find a significant 
> resource consumption hike (i.e, SPARK-22683). While this might not be a 
> concern for users that are more performance centric, for others conscious 
> about cost, such hike creates a migration obstacle. This situation can get 
> worse as more users are moving to cloud.
> Dynamic allocation make it possible for Spark to be deployed in multi-tenant 
> environment. With its performance-centric design, its inefficiency has also 
> unfortunately shown up, especially when compared with MR. Thus, it's believed 
> that MR-styled scheduler still has its merit. Based on our research, the 
> inefficiency associated with dynamic allocation comes in many aspects such as 
> executor idling out, bigger executors, many stages (rather than 2 stages only 
> in MR) in a spark job, etc.
> Rather than fine tuning dynamic allocation for efficiency, the proposal here 
> is to add a new, efficiency-centric  scheduling scheme based on that of MR. 
> Such a MR-based scheme can be further enhanced and be more adapted to Spark 
> execution model. This alternative is expected to offer good performance 
> improvement (compared to MR) still with similar to or even better efficiency 
> than MR.
> Inputs are greatly welcome!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to