[jira] [Updated] (SPARK-24374) SPIP: Support Barrier Scheduling in Apache Spark

Xiangrui Meng (JIRA) Wed, 23 May 2018 21:49:38 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-24374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Xiangrui Meng updated SPARK-24374:
----------------------------------
    Labels: SPIP  (was: )

> SPIP: Support Barrier Scheduling in Apache Spark
> ------------------------------------------------
>
>                 Key: SPARK-24374
>                 URL: https://issues.apache.org/jira/browse/SPARK-24374
>             Project: Spark
>          Issue Type: Story
>          Components: ML, Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>            Priority: Major
>              Labels: SPIP
>
> (See details in the linked SPIP doc.)
> The proposal here is to add a new scheduling model to Apache Spark so users 
> can properly embed distributed DL training as a Spark stage to simplify the 
> distributed training workflow. For example, Horovod uses MPI to implement 
> all-reduce to accelerate distributed TensorFlow training. The computation 
> model is different from MapReduce used by Spark. In Spark, a task in a stage 
> doesn’t depend on any other tasks in the same stage, and hence it can be 
> scheduled independently. In MPI, all workers start at the same time and pass 
> messages around. To embed this workload in Spark, we need to introduce a new 
> scheduling model, tentatively named “barrier scheduling”, which launches 
> tasks at the same time and provides users enough information and tooling to 
> embed distributed DL training. Spark can also provide an extra layer of fault 
> tolerance in case some tasks failed in the middle, where Spark would abort 
> all tasks and restart the stage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24374) SPIP: Support Barrier Scheduling in Apache Spark

Reply via email to