I created Jira <https://issues.apache.org/jira/browse/SPARK-3714> and design
doc
<https://docs.google.com/document/d/1q2Q8Ux-6uAkH7wtLJpc3jz-GfrDEjlbWlXtf20hvguk/edit?usp=sharing>
on
this matter.

2014-09-17 22:28 GMT+04:00 Reynold Xin <r...@databricks.com>:

> There might've been some misunderstanding. I was referring to the MLlib
> pipeline design doc when I said the design doc was posted, in response to
> the first paragraph of your original email.
>
>
> On Wed, Sep 17, 2014 at 2:47 AM, Egor Pahomov <pahomov.e...@gmail.com>
> wrote:
>
> > It's doc about MLLib pipeline functionality. What about oozie-like
> > workflow?
> >
> > 2014-09-17 13:08 GMT+04:00 Mark Hamstra <m...@clearstorydata.com>:
> >
> > > See https://issues.apache.org/jira/browse/SPARK-3530 and this doc,
> > > referenced in that JIRA:
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/1rVwXRjWKfIb-7PI6b86ipytwbUH7irSNLF1_6dLmh8o/edit?usp=sharing
> > >
> > > On Wed, Sep 17, 2014 at 2:00 AM, Egor Pahomov <pahomov.e...@gmail.com>
> > > wrote:
> > >
> > >> I have problems using Oozie. For example it doesn't sustain spark
> > context
> > >> like ooyola job server does. Other than GUI interfaces like HUE it's
> > hard
> > >> to work with - scoozie stopped in development year ago(I spoke with
> > >> creator) and oozie xml very hard to write.
> > >> Oozie still have all documentation and code in MR model rather than in
> > >> yarn
> > >> model. And based on it's current speed of development I can't expect
> > >> radical changes in nearest future. There is no "Databricks" for oozie,
> > >> which would have people on salary to develop this kind of radical
> > changes.
> > >> It's dinosaur.
> > >>
> > >> Reunold, can you help finding this doc? Do you mean just pipelining
> > spark
> > >> code or additional logic of persistence tasks, job server, task retry,
> > >> data
> > >> availability and extra?
> > >>
> > >>
> > >> 2014-09-17 11:21 GMT+04:00 Reynold Xin <r...@databricks.com>:
> > >>
> > >> > Hi Egor,
> > >> >
> > >> > I think the design doc for the pipeline feature has been posted.
> > >> >
> > >> > For the workflow, I believe Oozie actually works fine with Spark if
> > you
> > >> > want some external workflow system. Do you have any trouble using
> > that?
> > >> >
> > >> >
> > >> > On Tue, Sep 16, 2014 at 11:45 PM, Egor Pahomov <
> > pahomov.e...@gmail.com>
> > >> > wrote:
> > >> >
> > >> >> There are two things we(Yandex) miss in Spark: MLlib good
> > abstractions
> > >> and
> > >> >> good workflow job scheduler. From threads "Adding abstraction in
> > MlLib"
> > >> >> and
> > >> >> "[mllib] State of Multi-Model training" I got the idea, that
> > databricks
> > >> >> working on it and we should wait until first post doc, which would
> > lead
> > >> >> us.
> > >> >> What about workflow scheduler? Is there anyone already working on
> it?
> > >> Does
> > >> >> anyone have a plan on doing it?
> > >> >>
> > >> >> P.S. We thought that MLlib abstractions about multiple algorithms
> run
> > >> with
> > >> >> same data would need such scheduler, which would rerun algorithm in
> > >> case
> > >> >> of
> > >> >> failure. I understand, that spark provide fault tolerance out of
> the
> > >> box,
> > >> >> but we found some "Ooozie-like" scheduler more reliable for such
> long
> > >> >> living workflows.
> > >> >>
> > >> >> --
> > >> >>
> > >> >>
> > >> >>
> > >> >> *Sincerely yoursEgor PakhomovScala Developer, Yandex*
> > >> >>
> > >> >
> > >> >
> > >>
> > >>
> > >> --
> > >>
> > >>
> > >>
> > >> *Sincerely yoursEgor PakhomovScala Developer, Yandex*
> > >>
> > >
> > >
> >
> >
> > --
> >
> >
> >
> > *Sincerely yoursEgor PakhomovScala Developer, Yandex*
> >
>



-- 



*Sincerely yoursEgor PakhomovScala Developer, Yandex*

Reply via email to