Here at Inmobi, we use Apache Falcon <https://falcon.apache.org/>(with
oozie). The pipelines are fully functional in production. You can look into
Apache Falcon site for more details.

On Wed, Nov 30, 2016 at 7:36 AM, Tiago Albineli Motta <timo...@gmail.com>
wrote:

> Here at Globo.com we use Airflow to schedule and manage our spark
> pipeline. We use the Yarn API in the Airflow Dags to controls things like
> garantee that the job is not running before start another batch.
>
> Tiago Albineli Motta
> Desenvolvedor de Software - Globo.com
> ICQ: 32107100
> http://programandosemcafeina.blogspot.com
>
> On Tue, Nov 29, 2016 at 8:00 PM, Bruno Faria <brunocf...@hotmail.com>
> wrote:
>
>> I have a standalone Spark cluster and have some jobs scheduled using
>> crontab.
>>
>> It works but I don't have all the real time monitoring to get emails or
>> to control a flow for example.
>>
>> Thought about using the Spark "hidden" API to have a better control but
>> seems the API is not officially documented and I don't see much talking
>> about that on that web.
>>
>> Another option would be Oozie but looks like Oozie only works with Hadoop
>> so I'd need to install it and change my architecture.
>>
>> Is there any other option you suggest?
>>
>> I'm using only open source versions (no dist)
>>
>> Thanks
>>
>> Get Outlook for iOS <https://aka.ms/o0ukef>
>>
>>
>

Reply via email to