[jira] [Comment Edited] (SPARK-2387) Remove the stage barrier for better resource utilization

Kay Ousterhout (JIRA) Thu, 24 Jul 2014 13:25:39 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073597#comment-14073597
 ]


Kay Ousterhout edited comment on SPARK-2387 at 7/24/14 8:23 PM:
----------------------------------------------------------------

Have you done experiments to understand how much this improves performance?

With Hadoop MapReduce, I've seen this behavior significantly worsen performance 
for a few reasons.  Ultimately, the problem is that the "reduce" stage (the one 
the depends on the shuffle map stage) can't finish until all of the map tasks 
finish.  So, if there is a long map straggler, the reduce tasks can't finish 
anyway -- and now many more slots are hogged by the early reducers, preventing 
other jobs from making progress.  Even worse, if reduce tasks are launched 
before all map tasks have been launched, the early reducers keep map tasks from 
being launched, but can end up stopped waiting for input from mappers that 
haven't completed yet. (Although it looks like your pull request is done in a 
way that tries to avoid the latter problem.)

As a result of the above issues, I've heard that many places (I think Facebook, 
for example) disable this behavior in Hadoop.  So, we should make sure this 
will not hurt performance (and will significantly help!) before adding a lot of 
complexity to Spark in order to implement it.


was (Author: kayousterhout):
Have you done experiments to understand how much this improves performance?

With Hadoop MapReduce, I've seen this behavior significantly worsen performance 
for a few reasons.  Ultimately, the problem is that the "reduce" stage (the one 
the depends on the shuffle map stage) can't finish until all of the map tasks 
finish.  So, if there is a long map straggler, the reduce tasks can't finish 
anyway -- and now many more slots are hogged by the early reducers, preventing 
other jobs from making progress.  Even worse, if reduce tasks are launched 
before all map tasks have been launched, the early reducers keep map tasks from 
being launched, but can end up stopped waiting for input from mappers that 
haven't completed yet. (Although I didn't look closely at PR1328 so I'm not 
sure if the latter issue was explicitly prevented in your pull request.)

As a result of the above issues, I've heard that many places (I think Facebook, 
for example) disable this behavior in Hadoop.  So, we should make sure this 
will not hurt performance (and will significantly help!) before adding a lot of 
complexity to Spark in order to implement it.

> Remove the stage barrier for better resource utilization
> --------------------------------------------------------
>
>                 Key: SPARK-2387
>                 URL: https://issues.apache.org/jira/browse/SPARK-2387
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Rui Li
>
> DAGScheduler divides a Spark job into multiple stages according to RDD 
> dependencies. Whenever there’s a shuffle dependency, DAGScheduler creates a 
> shuffle map stage on the map side, and another stage depending on that stage.
> Currently, the downstream stage cannot start until all its depended stages 
> have finished. This barrier between stages leads to idle slots when waiting 
> for the last few upstream tasks to finish and thus wasting cluster resources.
> Therefore we propose to remove the barrier and pre-start the reduce stage 
> once there're free slots. This can achieve better resource utilization and 
> improve the overall job performance, especially when there're lots of 
> executors granted to the application.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (SPARK-2387) Remove the stage barrier for better resource utilization

Reply via email to