It’s exactly what my team is doing & what I shared here earlier last year (https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E <https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E> )
It’s somehow a “hacky” solution (and HA is not addressed), and now I’m thinking how we can have it more proper & robust. XD > On 2 Mar 2019, at 12:04 AM, Mario Urquizo <mario.urqu...@gmail.com> wrote: > > We have been running multiple schedulers for about 3 months. We created > multiple services to run airflow schedulers. The only difference is that > we have each of the schedulers pointed to a directory one level deeper than > the DAG home directory that the workers and webapp use. We have seen much > better scheduling performance but this does not yet help with HA. > > DAGS_HOME: > {airflow_home}/dags (webapp & workers) > {airflow_home}/dags/group-a/ (scheduler1) > {airflow_home}/dags/group-b/ (scheduler2) > {airflow_home}/dags/group-etc/ (scheduler3) > > Not sure if this helps, just sharing in case it does. > > Thank you, > Mario > > > On Fri, Mar 1, 2019 at 9:44 AM Bolke de Bruin <bdbr...@gmail.com> wrote: > >> I have done quite some work on making it possible to run multiple >> schedulers at the same time. At the moment I don’t think there are real >> blockers actually to do so. We just don’t actively test it. >> >> Database locking is mostly in place (DagRuns and TaskInstances). And I >> think the worst that can happen is that a task is scheduled twice. The task >> will detect this most of the time and kill one off if concurrent if not >> sequential then I will run again in some occasions. Everyone is having >> idempotent tasks right so no harm done? ;-) >> >> Have you encountered issues? Maybe work those out? >> >> Cheers >> Bolke. >> >> Verstuurd vanaf mijn iPad >> >>> Op 1 mrt. 2019 om 16:25 heeft Deng Xiaodong <xd.den...@gmail.com> het >> volgende geschreven: >>> >>> Hi Max, >>> >>> Following >> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E >> < >> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E>, >> I’m trying to prepare an AIP for supporting multiple-scheduler in Airflow >> (mainly for HA and Higher scheduling performance). >>> >>> Along the process of code checking, I found that there is one attribute >> of DagModel, “scheduler_lock”. It’s not used at all in current >> implementation, but it was introduced long time back (2015) to allow >> multiple schedulers to work together ( >> https://github.com/apache/airflow/commit/2070bfc50b5aa038301519ef7c630f2fcb569620 >> < >> https://github.com/apache/airflow/commit/2070bfc50b5aa038301519ef7c630f2fcb569620> >> ). >>> >>> Since you were the original author of it, it would be very helpful if >> you can kindly share why the multiple-schedulers implementation was removed >> eventually, and what challenges/complexity there were. >>> (You already shared a few valuable inputs in the earlier discussion >> https://lists.apache.org/thread.html/d37befd6f04dbdbfd2a2d41722352603bc2e2f97fb47bdc5ba454d0c@%3Cdev.airflow.apache.org%3E >> < >> https://lists.apache.org/thread.html/d37befd6f04dbdbfd2a2d41722352603bc2e2f97fb47bdc5ba454d0c@%3Cdev.airflow.apache.org%3E> >> , mainly relating to hiccups around concurrency, cross DAG prioritisation & >> load on DB. Other than these, anything else you would like to advise?) >>> >>> I will also dive into the git history further to understand it better. >>> >>> Thanks. >>> >>> >>> XD >>