GitHub user jiangxb1987 opened a pull request:

    https://github.com/apache/spark/pull/21758

    [SPARK-24795][CORE] Implement barrier execution mode

    ## What changes were proposed in this pull request?
    
    Propose new APIs and modify job/task scheduling to support barrier 
execution mode, which requires all tasks in a same barrier stage start at the 
same time, and retry all tasks in case some tasks fail in the middle. The 
barrier execution mode is useful for some ML/DL workloads.
    
    The proposed API changes include:
    `RDDBarrier` that marks an RDD as barrier (Spark must launch all the tasks 
together for the current stage).
    `BarrierTaskContext` that support global sync of all tasks in a barrier 
stage, and provide extra `BarrierTaskInfo`s.
    
    In DAGScheduler, we retry all tasks of a barrier stage in case some tasks 
fail in the middle, this is achieved by unregistering map outputs for a 
shuffleId (for ShuffleMapStage) or clear the finished partitions in an active 
job (for ResultStage).
    
    ## How was this patch tested?
    
    Add `RDDBarrierSuite` to ensure we convert RDDs correctly;
    Add new test cases in `DAGSchedulerSuite` to ensure we do task scheduling 
correctly;
    Add new test cases in `SparkContextSuite` to ensure the barrier execution 
mode actually works (both under local mode and local cluster mode).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jiangxb1987/spark barrier-execution-mode

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21758.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21758
    
----
commit c25ec473ff078c071aec513953f56c64e6a228a4
Author: Xingbo Jiang <xingbo.jiang@...>
Date:   2018-07-12T17:38:58Z

    implement barrier execution mode.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to