Re: Workflow Scheduler for Spark

Mark Hamstra Wed, 17 Sep 2014 02:09:13 -0700

See https://issues.apache.org/jira/browse/SPARK-3530 and this doc,
referenced in that JIRA:


https://docs.google.com/document/d/1rVwXRjWKfIb-7PI6b86ipytwbUH7irSNLF1_6dLmh8o/edit?usp=sharing

On Wed, Sep 17, 2014 at 2:00 AM, Egor Pahomov <pahomov.e...@gmail.com>
wrote:

> I have problems using Oozie. For example it doesn't sustain spark context
> like ooyola job server does. Other than GUI interfaces like HUE it's hard
> to work with - scoozie stopped in development year ago(I spoke with
> creator) and oozie xml very hard to write.
> Oozie still have all documentation and code in MR model rather than in yarn
> model. And based on it's current speed of development I can't expect
> radical changes in nearest future. There is no "Databricks" for oozie,
> which would have people on salary to develop this kind of radical changes.
> It's dinosaur.
>
> Reunold, can you help finding this doc? Do you mean just pipelining spark
> code or additional logic of persistence tasks, job server, task retry, data
> availability and extra?
>
>
> 2014-09-17 11:21 GMT+04:00 Reynold Xin <r...@databricks.com>:
>
> > Hi Egor,
> >
> > I think the design doc for the pipeline feature has been posted.
> >
> > For the workflow, I believe Oozie actually works fine with Spark if you
> > want some external workflow system. Do you have any trouble using that?
> >
> >
> > On Tue, Sep 16, 2014 at 11:45 PM, Egor Pahomov <pahomov.e...@gmail.com>
> > wrote:
> >
> >> There are two things we(Yandex) miss in Spark: MLlib good abstractions
> and
> >> good workflow job scheduler. From threads "Adding abstraction in MlLib"
> >> and
> >> "[mllib] State of Multi-Model training" I got the idea, that databricks
> >> working on it and we should wait until first post doc, which would lead
> >> us.
> >> What about workflow scheduler? Is there anyone already working on it?
> Does
> >> anyone have a plan on doing it?
> >>
> >> P.S. We thought that MLlib abstractions about multiple algorithms run
> with
> >> same data would need such scheduler, which would rerun algorithm in case
> >> of
> >> failure. I understand, that spark provide fault tolerance out of the
> box,
> >> but we found some "Ooozie-like" scheduler more reliable for such long
> >> living workflows.
> >>
> >> --
> >>
> >>
> >>
> >> *Sincerely yoursEgor PakhomovScala Developer, Yandex*
> >>
> >
> >
>
>
> --
>
>
>
> *Sincerely yoursEgor PakhomovScala Developer, Yandex*
>

Re: Workflow Scheduler for Spark

Reply via email to