[ 
https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295814#comment-16295814
 ] 

Xuefu Zhang commented on SPARK-22765:
-------------------------------------

As an update, I managed to backport SPARK-21656, among a few others, into our 
codebase and measured the efficiency improvement with a smaller idle time (from 
60s to 5s). Our test shows that the efficiency gain is significant (consistent 
2X) for small jobs, especially those with many stages. For large, long-running 
jobs. the gain is less significant, about 10% higher than 60s).

I'd like to point out that even with 2X improvement on efficiency, for small 
jobs, Sparks is till behind. With 60s idle time, MR uses only about 35% of 
resource used by Spark. With 5s, now MR uses about 70% of that used by Spark. I 
suspect that the additional overhead comes from: 1. exponential ramp-up 
allocation; 2. bigger container (I have 4 core per container).

It seems clearer to me now about the desired allocation scheme that is 
optimized for efficiency:

1. Upfront allocation instead of exponential ramp-up
2. Zero idle time (reuse containers if there are pending tasks or kill them 
right way if there are none)
3. Optimizations for smaller containers (like 1 core per container).

In combination with executor conservation factor from SPARK-22683, the new 
scheme, which diverts from dynamic allocation widely, should offer better 
resource efficiency.

> Create a new executor allocation scheme based on that of MR
> -----------------------------------------------------------
>
>                 Key: SPARK-22765
>                 URL: https://issues.apache.org/jira/browse/SPARK-22765
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler
>    Affects Versions: 1.6.0
>            Reporter: Xuefu Zhang
>
> Many users migrating their workload from MR to Spark find a significant 
> resource consumption hike (i.e, SPARK-22683). While this might not be a 
> concern for users that are more performance centric, for others conscious 
> about cost, such hike creates a migration obstacle. This situation can get 
> worse as more users are moving to cloud.
> Dynamic allocation make it possible for Spark to be deployed in multi-tenant 
> environment. With its performance-centric design, its inefficiency has also 
> unfortunately shown up, especially when compared with MR. Thus, it's believed 
> that MR-styled scheduler still has its merit. Based on our research, the 
> inefficiency associated with dynamic allocation comes in many aspects such as 
> executor idling out, bigger executors, many stages (rather than 2 stages only 
> in MR) in a spark job, etc.
> Rather than fine tuning dynamic allocation for efficiency, the proposal here 
> is to add a new, efficiency-centric  scheduling scheme based on that of MR. 
> Such a MR-based scheme can be further enhanced and be more adapted to Spark 
> execution model. This alternative is expected to offer good performance 
> improvement (compared to MR) still with similar to or even better efficiency 
> than MR.
> Inputs are greatly welcome!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to