We use Talend, but not for Spark workflows.
Although it does have Spark componenets.

https://www.talend.com/download/talend-open-studio
It is free (commercial support available), easy to design and deploy
workflows.
Talend for BigData 6.0 was released as month ago.

Is anybody using Talend for Spark?



-- 
Ruslan Dautkhanov

On Tue, Aug 11, 2015 at 11:30 AM, Hien Luu <h...@linkedin.com.invalid>
wrote:

> We are in the middle of figuring that out.  At the high level, we want to
> combine the best parts of existing workflow solutions.
>
> On Fri, Aug 7, 2015 at 3:55 PM, Vikram Kone <vikramk...@gmail.com> wrote:
>
>> Hien,
>> Is Azkaban being phased out at linkedin as rumored? If so, what's
>> linkedin going to use for workflow scheduling? Is there something else
>> that's going to replace Azkaban?
>>
>> On Fri, Aug 7, 2015 at 11:25 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> In my opinion, choosing some particular project among its peers should
>>> leave enough room for future growth (which may come faster than you
>>> initially think).
>>>
>>> Cheers
>>>
>>> On Fri, Aug 7, 2015 at 11:23 AM, Hien Luu <h...@linkedin.com> wrote:
>>>
>>>> Scalability is a known issue due the the current architecture.  However
>>>> this will be applicable if you run more 20K jobs per day.
>>>>
>>>> On Fri, Aug 7, 2015 at 10:30 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>>> From what I heard (an ex-coworker who is Oozie committer), Azkaban is
>>>>> being phased out at LinkedIn because of scalability issues (though 
>>>>> UI-wise,
>>>>> Azkaban seems better).
>>>>>
>>>>> Vikram:
>>>>> I suggest you do more research in related projects (maybe using their
>>>>> mailing lists).
>>>>>
>>>>> Disclaimer: I don't work for LinkedIn.
>>>>>
>>>>> On Fri, Aug 7, 2015 at 10:12 AM, Nick Pentreath <
>>>>> nick.pentre...@gmail.com> wrote:
>>>>>
>>>>>> Hi Vikram,
>>>>>>
>>>>>> We use Azkaban (2.5.0) in our production workflow scheduling. We just
>>>>>> use local mode deployment and it is fairly easy to set up. It is pretty
>>>>>> easy to use and has a nice scheduling and logging interface, as well as
>>>>>> SLAs (like kill job and notify if it doesn't complete in 3 hours or
>>>>>> whatever).
>>>>>>
>>>>>> However Spark support is not present directly - we run everything
>>>>>> with shell scripts and spark-submit. There is a plugin interface where 
>>>>>> one
>>>>>> could create a Spark plugin, but I found it very cumbersome when I did
>>>>>> investigate and didn't have the time to work through it to develop that.
>>>>>>
>>>>>> It has some quirks and while there is actually a REST API for adding
>>>>>> jos and dynamically scheduling jobs, it is not documented anywhere so you
>>>>>> kinda have to figure it out for yourself. But in terms of ease of use I
>>>>>> found it way better than Oozie. I haven't tried Chronos, and it seemed
>>>>>> quite involved to set up. Haven't tried Luigi either.
>>>>>>
>>>>>> Spark job server is good but as you say lacks some stuff like
>>>>>> scheduling and DAG type workflows (independent of spark-defined job 
>>>>>> flows).
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 7, 2015 at 7:00 PM, Jörn Franke <jornfra...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Check also falcon in combination with oozie
>>>>>>>
>>>>>>> Le ven. 7 août 2015 à 17:51, Hien Luu <h...@linkedin.com.invalid> a
>>>>>>> écrit :
>>>>>>>
>>>>>>>> Looks like Oozie can satisfy most of your requirements.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Aug 7, 2015 at 8:43 AM, Vikram Kone <vikramk...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>> I'm looking for open source workflow tools/engines that allow us
>>>>>>>>> to schedule spark jobs on a datastax cassandra cluster. Since there 
>>>>>>>>> are
>>>>>>>>> tonnes of alternatives out there like Ozzie, Azkaban, Luigi , Chronos 
>>>>>>>>> etc,
>>>>>>>>> I wanted to check with people here to see what they are using today.
>>>>>>>>>
>>>>>>>>> Some of the requirements of the workflow engine that I'm looking
>>>>>>>>> for are
>>>>>>>>>
>>>>>>>>> 1. First class support for submitting Spark jobs on Cassandra. Not
>>>>>>>>> some wrapper Java code to submit tasks.
>>>>>>>>> 2. Active open source community support and well tested at
>>>>>>>>> production scale.
>>>>>>>>> 3. Should be dead easy to write job dependencices using XML or web
>>>>>>>>> interface . Ex; job A depends on Job B and Job C, so run Job A after 
>>>>>>>>> B and
>>>>>>>>> C are finished. Don't need to write full blown java applications to 
>>>>>>>>> specify
>>>>>>>>> job parameters and dependencies. Should be very simple to use.
>>>>>>>>> 4. Time based  recurrent scheduling. Run the spark jobs at a given
>>>>>>>>> time every hour or day or week or month.
>>>>>>>>> 5. Job monitoring, alerting on failures and email notifications on
>>>>>>>>> daily basis.
>>>>>>>>>
>>>>>>>>> I have looked at Ooyala's spark job server which seems to be hated
>>>>>>>>> towards making spark jobs run faster by sharing contexts between the 
>>>>>>>>> jobs
>>>>>>>>> but isn't a full blown workflow engine per se. A combination of spark 
>>>>>>>>> job
>>>>>>>>> server and workflow engine would be ideal
>>>>>>>>>
>>>>>>>>> Thanks for the inputs
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to