Preventing double-triggering by separating DAG files different schedulers parse sounds easier and more intuitive. I actually removed one of the double-triggering prevention logic here <https://github.com/apache/airflow/pull/4234/files#diff-a7f584b9502a6dd19987db41a8834ff9L127>(expensive) and was relying on this lock <https://github.com/apache/airflow/blob/master/airflow/models/__init__.py#L1233> to prevent double-firing and safe-guard our non-idempotent tasks( btw the insert can be insert overwrite to be idempotent).
Also tho in Airbnb we requeue tasks a lot, we haven't see double-firing recently. Cheers, Kevin Y On Fri, Mar 1, 2019 at 2:08 PM Maxime Beauchemin <maximebeauche...@gmail.com> wrote: > Forgot to mention: the intention was to use the lock, but I never > personally got to do the second phase which would consist of skipping the > DAG if the lock is on, and expire the lock eventually based on a config > setting. > > Max > > On Fri, Mar 1, 2019 at 1:57 PM Maxime Beauchemin < > maximebeauche...@gmail.com> > wrote: > > > My original intention with the lock was preventing "double-triggering" of > > task (triggering refers to the scheduler putting the message in the > queue). > > Airflow now has good "double-firing-prevention" of tasks (firing happens > > when the worker receives the message and starts the task), even if the > > scheduler was to go rogue or restart and send multiple triggers for a > task > > instance, the worker(s) should only start one task instance. That's done > by > > running the database assertions behind the conditions being met as read > > database transaction (no task can alter the rows that validate the > > assertion while it's getting asserted). In practice it's a little tricky > > and we've seen rogue double-firing in the past (I have no idea how often > > that happens). > > > > If we do want to prevent double-triggerring, we should make sure that 2 > > schedulers aren't processing the same DAG or DagRun at the same time. > That > > would mean for the scheduler to not start the process of locked DAGs, and > > by providing a mechanism to expire the locks after some time. > > > > Has anyone experienced double firing lately? If that exist we should fix > > it, but also be careful around multiple scheduler double-triggering as it > > would make that problem potentially much worse. > > > > Max > > > > On Fri, Mar 1, 2019 at 8:19 AM Deng Xiaodong <xd.den...@gmail.com> > wrote: > > > >> It’s exactly what my team is doing & what I shared here earlier last > year > >> ( > >> > https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E > >> < > >> > https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E > > > >> ) > >> > >> It’s somehow a “hacky” solution (and HA is not addressed), and now I’m > >> thinking how we can have it more proper & robust. > >> > >> > >> XD > >> > >> > On 2 Mar 2019, at 12:04 AM, Mario Urquizo <mario.urqu...@gmail.com> > >> wrote: > >> > > >> > We have been running multiple schedulers for about 3 months. We > created > >> > multiple services to run airflow schedulers. The only difference is > >> that > >> > we have each of the schedulers pointed to a directory one level deeper > >> than > >> > the DAG home directory that the workers and webapp use. We have seen > >> much > >> > better scheduling performance but this does not yet help with HA. > >> > > >> > DAGS_HOME: > >> > {airflow_home}/dags (webapp & workers) > >> > {airflow_home}/dags/group-a/ (scheduler1) > >> > {airflow_home}/dags/group-b/ (scheduler2) > >> > {airflow_home}/dags/group-etc/ (scheduler3) > >> > > >> > Not sure if this helps, just sharing in case it does. > >> > > >> > Thank you, > >> > Mario > >> > > >> > > >> > On Fri, Mar 1, 2019 at 9:44 AM Bolke de Bruin <bdbr...@gmail.com> > >> wrote: > >> > > >> >> I have done quite some work on making it possible to run multiple > >> >> schedulers at the same time. At the moment I don’t think there are > >> real > >> >> blockers actually to do so. We just don’t actively test it. > >> >> > >> >> Database locking is mostly in place (DagRuns and TaskInstances). And > I > >> >> think the worst that can happen is that a task is scheduled twice. > The > >> task > >> >> will detect this most of the time and kill one off if concurrent if > not > >> >> sequential then I will run again in some occasions. Everyone is > having > >> >> idempotent tasks right so no harm done? ;-) > >> >> > >> >> Have you encountered issues? Maybe work those out? > >> >> > >> >> Cheers > >> >> Bolke. > >> >> > >> >> Verstuurd vanaf mijn iPad > >> >> > >> >>> Op 1 mrt. 2019 om 16:25 heeft Deng Xiaodong <xd.den...@gmail.com> > het > >> >> volgende geschreven: > >> >>> > >> >>> Hi Max, > >> >>> > >> >>> Following > >> >> > >> > https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E > >> >> < > >> >> > >> > https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E > >> >, > >> >> I’m trying to prepare an AIP for supporting multiple-scheduler in > >> Airflow > >> >> (mainly for HA and Higher scheduling performance). > >> >>> > >> >>> Along the process of code checking, I found that there is one > >> attribute > >> >> of DagModel, “scheduler_lock”. It’s not used at all in current > >> >> implementation, but it was introduced long time back (2015) to allow > >> >> multiple schedulers to work together ( > >> >> > >> > https://github.com/apache/airflow/commit/2070bfc50b5aa038301519ef7c630f2fcb569620 > >> >> < > >> >> > >> > https://github.com/apache/airflow/commit/2070bfc50b5aa038301519ef7c630f2fcb569620 > >> > > >> >> ). > >> >>> > >> >>> Since you were the original author of it, it would be very helpful > if > >> >> you can kindly share why the multiple-schedulers implementation was > >> removed > >> >> eventually, and what challenges/complexity there were. > >> >>> (You already shared a few valuable inputs in the earlier discussion > >> >> > >> > https://lists.apache.org/thread.html/d37befd6f04dbdbfd2a2d41722352603bc2e2f97fb47bdc5ba454d0c@%3Cdev.airflow.apache.org%3E > >> >> < > >> >> > >> > https://lists.apache.org/thread.html/d37befd6f04dbdbfd2a2d41722352603bc2e2f97fb47bdc5ba454d0c@%3Cdev.airflow.apache.org%3E > >> > > >> >> , mainly relating to hiccups around concurrency, cross DAG > >> prioritisation & > >> >> load on DB. Other than these, anything else you would like to > advise?) > >> >>> > >> >>> I will also dive into the git history further to understand it > better. > >> >>> > >> >>> Thanks. > >> >>> > >> >>> > >> >>> XD > >> >> > >> > >> >