[ https://issues.apache.org/jira/browse/SPARK-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270444#comment-14270444 ]
Lianhui Wang commented on SPARK-2387: ------------------------------------- [~xuefuz] [~sandyr] [~lirui] yes, i think firstly we can provide a option to use this feature.example:spark.scheduler.removeStageBarrier if need this feature, we just open a option and will not affect other's features. > Remove the stage barrier for better resource utilization > -------------------------------------------------------- > > Key: SPARK-2387 > URL: https://issues.apache.org/jira/browse/SPARK-2387 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Reporter: Rui Li > > DAGScheduler divides a Spark job into multiple stages according to RDD > dependencies. Whenever there’s a shuffle dependency, DAGScheduler creates a > shuffle map stage on the map side, and another stage depending on that stage. > Currently, the downstream stage cannot start until all its depended stages > have finished. This barrier between stages leads to idle slots when waiting > for the last few upstream tasks to finish and thus wasting cluster resources. > Therefore we propose to remove the barrier and pre-start the reduce stage > once there're free slots. This can achieve better resource utilization and > improve the overall job performance, especially when there're lots of > executors granted to the application. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org