>From what I heard (an ex-coworker who is Oozie committer), Azkaban is being phased out at LinkedIn because of scalability issues (though UI-wise, Azkaban seems better).
Vikram: I suggest you do more research in related projects (maybe using their mailing lists). Disclaimer: I don't work for LinkedIn. On Fri, Aug 7, 2015 at 10:12 AM, Nick Pentreath <nick.pentre...@gmail.com> wrote: > Hi Vikram, > > We use Azkaban (2.5.0) in our production workflow scheduling. We just use > local mode deployment and it is fairly easy to set up. It is pretty easy to > use and has a nice scheduling and logging interface, as well as SLAs (like > kill job and notify if it doesn't complete in 3 hours or whatever). > > However Spark support is not present directly - we run everything with > shell scripts and spark-submit. There is a plugin interface where one could > create a Spark plugin, but I found it very cumbersome when I did > investigate and didn't have the time to work through it to develop that. > > It has some quirks and while there is actually a REST API for adding jos > and dynamically scheduling jobs, it is not documented anywhere so you kinda > have to figure it out for yourself. But in terms of ease of use I found it > way better than Oozie. I haven't tried Chronos, and it seemed quite > involved to set up. Haven't tried Luigi either. > > Spark job server is good but as you say lacks some stuff like scheduling > and DAG type workflows (independent of spark-defined job flows). > > > On Fri, Aug 7, 2015 at 7:00 PM, Jörn Franke <jornfra...@gmail.com> wrote: > >> Check also falcon in combination with oozie >> >> Le ven. 7 août 2015 à 17:51, Hien Luu <h...@linkedin.com.invalid> a >> écrit : >> >>> Looks like Oozie can satisfy most of your requirements. >>> >>> >>> >>> On Fri, Aug 7, 2015 at 8:43 AM, Vikram Kone <vikramk...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> I'm looking for open source workflow tools/engines that allow us to >>>> schedule spark jobs on a datastax cassandra cluster. Since there are tonnes >>>> of alternatives out there like Ozzie, Azkaban, Luigi , Chronos etc, I >>>> wanted to check with people here to see what they are using today. >>>> >>>> Some of the requirements of the workflow engine that I'm looking for are >>>> >>>> 1. First class support for submitting Spark jobs on Cassandra. Not some >>>> wrapper Java code to submit tasks. >>>> 2. Active open source community support and well tested at production >>>> scale. >>>> 3. Should be dead easy to write job dependencices using XML or web >>>> interface . Ex; job A depends on Job B and Job C, so run Job A after B and >>>> C are finished. Don't need to write full blown java applications to specify >>>> job parameters and dependencies. Should be very simple to use. >>>> 4. Time based recurrent scheduling. Run the spark jobs at a given time >>>> every hour or day or week or month. >>>> 5. Job monitoring, alerting on failures and email notifications on >>>> daily basis. >>>> >>>> I have looked at Ooyala's spark job server which seems to be hated >>>> towards making spark jobs run faster by sharing contexts between the jobs >>>> but isn't a full blown workflow engine per se. A combination of spark job >>>> server and workflow engine would be ideal >>>> >>>> Thanks for the inputs >>>> >>> >>> >