Xingbo, Please reference the spip and jira ticket next time: [SPARK-24374] SPIP: Support Barrier Scheduling in Apache Spark
On Sun, Jul 8, 2018 at 9:45 AM Xingbo Jiang <jiangxb1...@gmail.com> wrote: > Hi All, > > I would like to invite you to review the design document for Barrier > Execution Mode: > > https://docs.google.com/document/d/1GvcYR6ZFto3dOnjfLjZMtTezX0W5VYN9w1l4-tQXaZk/edit# > > TL;DR: We announced the project Hydrogen on recent Spark+AI Summit, a > major part of the project involves significant changes to execution mode of > Spark. This design doc proposes new APIs as well as new execution mode > (known as barrier execution mode) to provide high-performance support for > DL workloads. > > Major changes include: > > - Add RDDBarrier to support gang scheduling. > - Add BarrierTaskContext to support global sync of all tasks in a > stage; > - Better fault tolerance approach for barrier stage, that in case some > tasks fail in the middle, retry all tasks in the same stage. > - Integrate barrier execution mode with Standalone cluster manager. > > Please feel free to review and discuss on the design proposal. > > Thanks, > Xingbo > >