Hi Egor,

I think the design doc for the pipeline feature has been posted.

For the workflow, I believe Oozie actually works fine with Spark if you
want some external workflow system. Do you have any trouble using that?


On Tue, Sep 16, 2014 at 11:45 PM, Egor Pahomov <pahomov.e...@gmail.com>
wrote:

> There are two things we(Yandex) miss in Spark: MLlib good abstractions and
> good workflow job scheduler. From threads "Adding abstraction in MlLib" and
> "[mllib] State of Multi-Model training" I got the idea, that databricks
> working on it and we should wait until first post doc, which would lead us.
> What about workflow scheduler? Is there anyone already working on it? Does
> anyone have a plan on doing it?
>
> P.S. We thought that MLlib abstractions about multiple algorithms run with
> same data would need such scheduler, which would rerun algorithm in case of
> failure. I understand, that spark provide fault tolerance out of the box,
> but we found some "Ooozie-like" scheduler more reliable for such long
> living workflows.
>
> --
>
>
>
> *Sincerely yoursEgor PakhomovScala Developer, Yandex*
>

Reply via email to