Re: Spark job workflow engine recommendations

2015-11-18 Thread Vikram Kone
Hi Feng, Does airflow allow remote submissions of spark jobs via spark-submit? On Wed, Nov 18, 2015 at 6:01 PM, Fengdong Yu wrote: > Hi, > > we use ‘Airflow' as our job workflow scheduler. > > > > > On Nov 19, 2015, at 9:47 AM, Vikram Kone

Re: Spark job workflow engine recommendations

2015-11-18 Thread Fengdong Yu
Hi, we use ‘Airflow' as our job workflow scheduler. > On Nov 19, 2015, at 9:47 AM, Vikram Kone wrote: > > Hi Nick, > Quick question about spark-submit command executed from azkaban with command > job type. > I see that when I press kill in azkaban portal on a

Re: Spark job workflow engine recommendations

2015-11-18 Thread Fengdong Yu
Yes, you can submit job remotely. > On Nov 19, 2015, at 10:10 AM, Vikram Kone wrote: > > Hi Feng, > Does airflow allow remote submissions of spark jobs via spark-submit? > > On Wed, Nov 18, 2015 at 6:01 PM, Fengdong Yu

Re: Spark job workflow engine recommendations

2015-11-18 Thread Vikram Kone
Hi Nick, Quick question about spark-submit command executed from azkaban with command job type. I see that when I press kill in azkaban portal on a spark-submit job, it doesn't actually kill the application on spark master and it continues to run even though azkaban thinks that it's killed. How do

Re: Spark job workflow engine recommendations

2015-10-07 Thread Nick Pentreath
We're also using Azkaban for scheduling, and we simply use spark-submit via she'll scripts. It works fine. The auto retry feature with a large number of retries (like 100 or 1000 perhaps) should take care of long-running jobs with restarts on failure. We haven't used it for streaming yet

Re: Spark job workflow engine recommendations

2015-10-07 Thread Vikram Kone
Hien, I saw this pull request and from what I understand this is geared towards running spark jobs over hadoop. We are using spark over cassandra and not sure if this new jobtype supports that. I haven't seen any documentation in regards to how to use this spark job plugin, so that I can test it

Re: Spark job workflow engine recommendations

2015-10-07 Thread Hien Luu
The spark job type was added recently - see this pull request https://github.com/azkaban/azkaban-plugins/pull/195. You can leverage the SLA feature to kill a job if it ran longer than expected. BTW, we just solved the scalability issue by supporting multiple executors. Within a week or two, the

Re: Spark job workflow engine recommendations

2015-10-06 Thread Vikram Kone
Does Azkaban support scheduling long running jobs like spark steaming jobs? Will Azkaban kill a job if it's running for a long time. On Friday, August 7, 2015, Vikram Kone wrote: > Hien, > Is Azkaban being phased out at linkedin as rumored? If so, what's linkedin > going

Re: Spark job workflow engine recommendations

2015-08-11 Thread Ruslan Dautkhanov
We use Talend, but not for Spark workflows. Although it does have Spark componenets. https://www.talend.com/download/talend-open-studio It is free (commercial support available), easy to design and deploy workflows. Talend for BigData 6.0 was released as month ago. Is anybody using Talend for

Re: Spark job workflow engine recommendations

2015-08-11 Thread Hien Luu
We are in the middle of figuring that out. At the high level, we want to combine the best parts of existing workflow solutions. On Fri, Aug 7, 2015 at 3:55 PM, Vikram Kone vikramk...@gmail.com wrote: Hien, Is Azkaban being phased out at linkedin as rumored? If so, what's linkedin going to

Re: Spark job workflow engine recommendations

2015-08-11 Thread Nick Pentreath
I also tend to agree that Azkaban is somehqat easier to get set up. Though I haven't used the new UI for Oozie that is part of CDH, so perhaps that is another good option. It's a pity Azkaban is a little rough in terms of documenting its API, and the scalability is an issue. However it

Re: Spark job workflow engine recommendations

2015-08-09 Thread Lars Albertsson
I used to maintain Luigi at Spotify, and got some insight in workflow manager characteristics and production behaviour in the process. I am evaluating options for my current employer, and the short list is basically: Luigi, Azkaban, Pinball, Airflow, and rolling our own. The latter is not

Re: Spark job workflow engine recommendations

2015-08-07 Thread Vikram Kone
Hien, Is Azkaban being phased out at linkedin as rumored? If so, what's linkedin going to use for workflow scheduling? Is there something else that's going to replace Azkaban? On Fri, Aug 7, 2015 at 11:25 AM, Ted Yu yuzhih...@gmail.com wrote: In my opinion, choosing some particular project

Spark job workflow engine recommendations

2015-08-07 Thread Vikram Kone
Hi, I'm looking for open source workflow tools/engines that allow us to schedule spark jobs on a datastax cassandra cluster. Since there are tonnes of alternatives out there like Ozzie, Azkaban, Luigi , Chronos etc, I wanted to check with people here to see what they are using today. Some of the

Re: Spark job workflow engine recommendations

2015-08-07 Thread Hien Luu
Looks like Oozie can satisfy most of your requirements. On Fri, Aug 7, 2015 at 8:43 AM, Vikram Kone vikramk...@gmail.com wrote: Hi, I'm looking for open source workflow tools/engines that allow us to schedule spark jobs on a datastax cassandra cluster. Since there are tonnes of

Re: Spark job workflow engine recommendations

2015-08-07 Thread Vikram Kone
Thanks for the suggestion Hien. I'm curious why not azkaban from linkedin. From what I read online Oozie was very cumbersome to setup and use compared to azkaban. Since you are from linkedin wanted to get some perspective on what it lacks compared to Oozie. Ease of use is very important more than

Re: Spark job workflow engine recommendations

2015-08-07 Thread Jörn Franke
Check also falcon in combination with oozie Le ven. 7 août 2015 à 17:51, Hien Luu h...@linkedin.com.invalid a écrit : Looks like Oozie can satisfy most of your requirements. On Fri, Aug 7, 2015 at 8:43 AM, Vikram Kone vikramk...@gmail.com wrote: Hi, I'm looking for open source workflow

Re: Spark job workflow engine recommendations

2015-08-07 Thread Ted Yu
From what I heard (an ex-coworker who is Oozie committer), Azkaban is being phased out at LinkedIn because of scalability issues (though UI-wise, Azkaban seems better). Vikram: I suggest you do more research in related projects (maybe using their mailing lists). Disclaimer: I don't work for

Re: Spark job workflow engine recommendations

2015-08-07 Thread Vikram Kone
Oh ok. That's a good enough reason against azkaban then. So looks like Oozie is the best choice here. On Friday, August 7, 2015, Ted Yu yuzhih...@gmail.com wrote: From what I heard (an ex-coworker who is Oozie committer), Azkaban is being phased out at LinkedIn because of scalability issues