Does the proposal use master-slave architecture(leader scheduler vs slave scheduler)?
On Fri, Mar 1, 2019 at 5:32 PM Kevin Yang <yrql...@gmail.com> wrote: > Preventing double-triggering by separating DAG files different schedulers > parse sounds easier and more intuitive. I actually removed one of the > double-triggering prevention logic here > < > https://github.com/apache/airflow/pull/4234/files#diff-a7f584b9502a6dd19987db41a8834ff9L127 > >(expensive) > and > was relying on this lock > < > https://github.com/apache/airflow/blob/master/airflow/models/__init__.py#L1233 > > > to > prevent double-firing and safe-guard our non-idempotent tasks( btw the > insert can be insert overwrite to be idempotent). > > Also tho in Airbnb we requeue tasks a lot, we haven't see double-firing > recently. > > Cheers, > Kevin Y > > On Fri, Mar 1, 2019 at 2:08 PM Maxime Beauchemin < > maximebeauche...@gmail.com> > wrote: > > > Forgot to mention: the intention was to use the lock, but I never > > personally got to do the second phase which would consist of skipping the > > DAG if the lock is on, and expire the lock eventually based on a config > > setting. > > > > Max > > > > On Fri, Mar 1, 2019 at 1:57 PM Maxime Beauchemin < > > maximebeauche...@gmail.com> > > wrote: > > > > > My original intention with the lock was preventing "double-triggering" > of > > > task (triggering refers to the scheduler putting the message in the > > queue). > > > Airflow now has good "double-firing-prevention" of tasks (firing > happens > > > when the worker receives the message and starts the task), even if the > > > scheduler was to go rogue or restart and send multiple triggers for a > > task > > > instance, the worker(s) should only start one task instance. That's > done > > by > > > running the database assertions behind the conditions being met as read > > > database transaction (no task can alter the rows that validate the > > > assertion while it's getting asserted). In practice it's a little > tricky > > > and we've seen rogue double-firing in the past (I have no idea how > often > > > that happens). > > > > > > If we do want to prevent double-triggerring, we should make sure that 2 > > > schedulers aren't processing the same DAG or DagRun at the same time. > > That > > > would mean for the scheduler to not start the process of locked DAGs, > and > > > by providing a mechanism to expire the locks after some time. > > > > > > Has anyone experienced double firing lately? If that exist we should > fix > > > it, but also be careful around multiple scheduler double-triggering as > it > > > would make that problem potentially much worse. > > > > > > Max > > > > > > On Fri, Mar 1, 2019 at 8:19 AM Deng Xiaodong <xd.den...@gmail.com> > > wrote: > > > > > >> It’s exactly what my team is doing & what I shared here earlier last > > year > > >> ( > > >> > > > https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E > > >> < > > >> > > > https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E > > > > > >> ) > > >> > > >> It’s somehow a “hacky” solution (and HA is not addressed), and now I’m > > >> thinking how we can have it more proper & robust. > > >> > > >> > > >> XD > > >> > > >> > On 2 Mar 2019, at 12:04 AM, Mario Urquizo <mario.urqu...@gmail.com> > > >> wrote: > > >> > > > >> > We have been running multiple schedulers for about 3 months. We > > created > > >> > multiple services to run airflow schedulers. The only difference is > > >> that > > >> > we have each of the schedulers pointed to a directory one level > deeper > > >> than > > >> > the DAG home directory that the workers and webapp use. We have seen > > >> much > > >> > better scheduling performance but this does not yet help with HA. > > >> > > > >> > DAGS_HOME: > > >> > {airflow_home}/dags (webapp & workers) > > >> > {airflow_home}/dags/group-a/ (scheduler1) > > >> > {airflow_home}/dags/group-b/ (scheduler2) > > >> > {airflow_home}/dags/group-etc/ (scheduler3) > > >> > > > >> > Not sure if this helps, just sharing in case it does. > > >> > > > >> > Thank you, > > >> > Mario > > >> > > > >> > > > >> > On Fri, Mar 1, 2019 at 9:44 AM Bolke de Bruin <bdbr...@gmail.com> > > >> wrote: > > >> > > > >> >> I have done quite some work on making it possible to run multiple > > >> >> schedulers at the same time. At the moment I don’t think there are > > >> real > > >> >> blockers actually to do so. We just don’t actively test it. > > >> >> > > >> >> Database locking is mostly in place (DagRuns and TaskInstances). > And > > I > > >> >> think the worst that can happen is that a task is scheduled twice. > > The > > >> task > > >> >> will detect this most of the time and kill one off if concurrent if > > not > > >> >> sequential then I will run again in some occasions. Everyone is > > having > > >> >> idempotent tasks right so no harm done? ;-) > > >> >> > > >> >> Have you encountered issues? Maybe work those out? > > >> >> > > >> >> Cheers > > >> >> Bolke. > > >> >> > > >> >> Verstuurd vanaf mijn iPad > > >> >> > > >> >>> Op 1 mrt. 2019 om 16:25 heeft Deng Xiaodong <xd.den...@gmail.com> > > het > > >> >> volgende geschreven: > > >> >>> > > >> >>> Hi Max, > > >> >>> > > >> >>> Following > > >> >> > > >> > > > https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E > > >> >> < > > >> >> > > >> > > > https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E > > >> >, > > >> >> I’m trying to prepare an AIP for supporting multiple-scheduler in > > >> Airflow > > >> >> (mainly for HA and Higher scheduling performance). > > >> >>> > > >> >>> Along the process of code checking, I found that there is one > > >> attribute > > >> >> of DagModel, “scheduler_lock”. It’s not used at all in current > > >> >> implementation, but it was introduced long time back (2015) to > allow > > >> >> multiple schedulers to work together ( > > >> >> > > >> > > > https://github.com/apache/airflow/commit/2070bfc50b5aa038301519ef7c630f2fcb569620 > > >> >> < > > >> >> > > >> > > > https://github.com/apache/airflow/commit/2070bfc50b5aa038301519ef7c630f2fcb569620 > > >> > > > >> >> ). > > >> >>> > > >> >>> Since you were the original author of it, it would be very helpful > > if > > >> >> you can kindly share why the multiple-schedulers implementation was > > >> removed > > >> >> eventually, and what challenges/complexity there were. > > >> >>> (You already shared a few valuable inputs in the earlier > discussion > > >> >> > > >> > > > https://lists.apache.org/thread.html/d37befd6f04dbdbfd2a2d41722352603bc2e2f97fb47bdc5ba454d0c@%3Cdev.airflow.apache.org%3E > > >> >> < > > >> >> > > >> > > > https://lists.apache.org/thread.html/d37befd6f04dbdbfd2a2d41722352603bc2e2f97fb47bdc5ba454d0c@%3Cdev.airflow.apache.org%3E > > >> > > > >> >> , mainly relating to hiccups around concurrency, cross DAG > > >> prioritisation & > > >> >> load on DB. Other than these, anything else you would like to > > advise?) > > >> >>> > > >> >>> I will also dive into the git history further to understand it > > better. > > >> >>> > > >> >>> Thanks. > > >> >>> > > >> >>> > > >> >>> XD > > >> >> > > >> > > >> > > >