Forgot to mention: the intention was to use the lock, but I never personally got to do the second phase which would consist of skipping the DAG if the lock is on, and expire the lock eventually based on a config setting.
Max On Fri, Mar 1, 2019 at 1:57 PM Maxime Beauchemin <maximebeauche...@gmail.com> wrote: > My original intention with the lock was preventing "double-triggering" of > task (triggering refers to the scheduler putting the message in the queue). > Airflow now has good "double-firing-prevention" of tasks (firing happens > when the worker receives the message and starts the task), even if the > scheduler was to go rogue or restart and send multiple triggers for a task > instance, the worker(s) should only start one task instance. That's done by > running the database assertions behind the conditions being met as read > database transaction (no task can alter the rows that validate the > assertion while it's getting asserted). In practice it's a little tricky > and we've seen rogue double-firing in the past (I have no idea how often > that happens). > > If we do want to prevent double-triggerring, we should make sure that 2 > schedulers aren't processing the same DAG or DagRun at the same time. That > would mean for the scheduler to not start the process of locked DAGs, and > by providing a mechanism to expire the locks after some time. > > Has anyone experienced double firing lately? If that exist we should fix > it, but also be careful around multiple scheduler double-triggering as it > would make that problem potentially much worse. > > Max > > On Fri, Mar 1, 2019 at 8:19 AM Deng Xiaodong <xd.den...@gmail.com> wrote: > >> It’s exactly what my team is doing & what I shared here earlier last year >> ( >> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E >> < >> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E> >> ) >> >> It’s somehow a “hacky” solution (and HA is not addressed), and now I’m >> thinking how we can have it more proper & robust. >> >> >> XD >> >> > On 2 Mar 2019, at 12:04 AM, Mario Urquizo <mario.urqu...@gmail.com> >> wrote: >> > >> > We have been running multiple schedulers for about 3 months. We created >> > multiple services to run airflow schedulers. The only difference is >> that >> > we have each of the schedulers pointed to a directory one level deeper >> than >> > the DAG home directory that the workers and webapp use. We have seen >> much >> > better scheduling performance but this does not yet help with HA. >> > >> > DAGS_HOME: >> > {airflow_home}/dags (webapp & workers) >> > {airflow_home}/dags/group-a/ (scheduler1) >> > {airflow_home}/dags/group-b/ (scheduler2) >> > {airflow_home}/dags/group-etc/ (scheduler3) >> > >> > Not sure if this helps, just sharing in case it does. >> > >> > Thank you, >> > Mario >> > >> > >> > On Fri, Mar 1, 2019 at 9:44 AM Bolke de Bruin <bdbr...@gmail.com> >> wrote: >> > >> >> I have done quite some work on making it possible to run multiple >> >> schedulers at the same time. At the moment I don’t think there are >> real >> >> blockers actually to do so. We just don’t actively test it. >> >> >> >> Database locking is mostly in place (DagRuns and TaskInstances). And I >> >> think the worst that can happen is that a task is scheduled twice. The >> task >> >> will detect this most of the time and kill one off if concurrent if not >> >> sequential then I will run again in some occasions. Everyone is having >> >> idempotent tasks right so no harm done? ;-) >> >> >> >> Have you encountered issues? Maybe work those out? >> >> >> >> Cheers >> >> Bolke. >> >> >> >> Verstuurd vanaf mijn iPad >> >> >> >>> Op 1 mrt. 2019 om 16:25 heeft Deng Xiaodong <xd.den...@gmail.com> het >> >> volgende geschreven: >> >>> >> >>> Hi Max, >> >>> >> >>> Following >> >> >> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E >> >> < >> >> >> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E >> >, >> >> I’m trying to prepare an AIP for supporting multiple-scheduler in >> Airflow >> >> (mainly for HA and Higher scheduling performance). >> >>> >> >>> Along the process of code checking, I found that there is one >> attribute >> >> of DagModel, “scheduler_lock”. It’s not used at all in current >> >> implementation, but it was introduced long time back (2015) to allow >> >> multiple schedulers to work together ( >> >> >> https://github.com/apache/airflow/commit/2070bfc50b5aa038301519ef7c630f2fcb569620 >> >> < >> >> >> https://github.com/apache/airflow/commit/2070bfc50b5aa038301519ef7c630f2fcb569620 >> > >> >> ). >> >>> >> >>> Since you were the original author of it, it would be very helpful if >> >> you can kindly share why the multiple-schedulers implementation was >> removed >> >> eventually, and what challenges/complexity there were. >> >>> (You already shared a few valuable inputs in the earlier discussion >> >> >> https://lists.apache.org/thread.html/d37befd6f04dbdbfd2a2d41722352603bc2e2f97fb47bdc5ba454d0c@%3Cdev.airflow.apache.org%3E >> >> < >> >> >> https://lists.apache.org/thread.html/d37befd6f04dbdbfd2a2d41722352603bc2e2f97fb47bdc5ba454d0c@%3Cdev.airflow.apache.org%3E >> > >> >> , mainly relating to hiccups around concurrency, cross DAG >> prioritisation & >> >> load on DB. Other than these, anything else you would like to advise?) >> >>> >> >>> I will also dive into the git history further to understand it better. >> >>> >> >>> Thanks. >> >>> >> >>> >> >>> XD >> >> >> >>