See https://issues.apache.org/jira/browse/SPARK-3530 and this doc, referenced in that JIRA:
https://docs.google.com/document/d/1rVwXRjWKfIb-7PI6b86ipytwbUH7irSNLF1_6dLmh8o/edit?usp=sharing On Wed, Sep 17, 2014 at 2:00 AM, Egor Pahomov <pahomov.e...@gmail.com> wrote: > I have problems using Oozie. For example it doesn't sustain spark context > like ooyola job server does. Other than GUI interfaces like HUE it's hard > to work with - scoozie stopped in development year ago(I spoke with > creator) and oozie xml very hard to write. > Oozie still have all documentation and code in MR model rather than in yarn > model. And based on it's current speed of development I can't expect > radical changes in nearest future. There is no "Databricks" for oozie, > which would have people on salary to develop this kind of radical changes. > It's dinosaur. > > Reunold, can you help finding this doc? Do you mean just pipelining spark > code or additional logic of persistence tasks, job server, task retry, data > availability and extra? > > > 2014-09-17 11:21 GMT+04:00 Reynold Xin <r...@databricks.com>: > > > Hi Egor, > > > > I think the design doc for the pipeline feature has been posted. > > > > For the workflow, I believe Oozie actually works fine with Spark if you > > want some external workflow system. Do you have any trouble using that? > > > > > > On Tue, Sep 16, 2014 at 11:45 PM, Egor Pahomov <pahomov.e...@gmail.com> > > wrote: > > > >> There are two things we(Yandex) miss in Spark: MLlib good abstractions > and > >> good workflow job scheduler. From threads "Adding abstraction in MlLib" > >> and > >> "[mllib] State of Multi-Model training" I got the idea, that databricks > >> working on it and we should wait until first post doc, which would lead > >> us. > >> What about workflow scheduler? Is there anyone already working on it? > Does > >> anyone have a plan on doing it? > >> > >> P.S. We thought that MLlib abstractions about multiple algorithms run > with > >> same data would need such scheduler, which would rerun algorithm in case > >> of > >> failure. I understand, that spark provide fault tolerance out of the > box, > >> but we found some "Ooozie-like" scheduler more reliable for such long > >> living workflows. > >> > >> -- > >> > >> > >> > >> *Sincerely yoursEgor PakhomovScala Developer, Yandex* > >> > > > > > > > -- > > > > *Sincerely yoursEgor PakhomovScala Developer, Yandex* >