Hi Egor, I think the design doc for the pipeline feature has been posted.
For the workflow, I believe Oozie actually works fine with Spark if you want some external workflow system. Do you have any trouble using that? On Tue, Sep 16, 2014 at 11:45 PM, Egor Pahomov <pahomov.e...@gmail.com> wrote: > There are two things we(Yandex) miss in Spark: MLlib good abstractions and > good workflow job scheduler. From threads "Adding abstraction in MlLib" and > "[mllib] State of Multi-Model training" I got the idea, that databricks > working on it and we should wait until first post doc, which would lead us. > What about workflow scheduler? Is there anyone already working on it? Does > anyone have a plan on doing it? > > P.S. We thought that MLlib abstractions about multiple algorithms run with > same data would need such scheduler, which would rerun algorithm in case of > failure. I understand, that spark provide fault tolerance out of the box, > but we found some "Ooozie-like" scheduler more reliable for such long > living workflows. > > -- > > > > *Sincerely yoursEgor PakhomovScala Developer, Yandex* >