Here at Inmobi, we use Apache Falcon <https://falcon.apache.org/>(with oozie). The pipelines are fully functional in production. You can look into Apache Falcon site for more details.
On Wed, Nov 30, 2016 at 7:36 AM, Tiago Albineli Motta <timo...@gmail.com> wrote: > Here at Globo.com we use Airflow to schedule and manage our spark > pipeline. We use the Yarn API in the Airflow Dags to controls things like > garantee that the job is not running before start another batch. > > Tiago Albineli Motta > Desenvolvedor de Software - Globo.com > ICQ: 32107100 > http://programandosemcafeina.blogspot.com > > On Tue, Nov 29, 2016 at 8:00 PM, Bruno Faria <brunocf...@hotmail.com> > wrote: > >> I have a standalone Spark cluster and have some jobs scheduled using >> crontab. >> >> It works but I don't have all the real time monitoring to get emails or >> to control a flow for example. >> >> Thought about using the Spark "hidden" API to have a better control but >> seems the API is not officially documented and I don't see much talking >> about that on that web. >> >> Another option would be Oozie but looks like Oozie only works with Hadoop >> so I'd need to install it and change my architecture. >> >> Is there any other option you suggest? >> >> I'm using only open source versions (no dist) >> >> Thanks >> >> Get Outlook for iOS <https://aka.ms/o0ukef> >> >> >