My original intention with the lock was preventing "double-triggering" of
task (triggering refers to the scheduler putting the message in the queue).
Airflow now has good "double-firing-prevention" of tasks (firing happens
when the worker receives the message and starts the task), even if the
scheduler was to go rogue or restart and send multiple triggers for a task
instance, the worker(s) should only start one task instance. That's done by
running the database assertions behind the conditions being met as read
database transaction (no task can alter the rows that validate the
assertion while it's getting asserted). In practice it's a little tricky
and we've seen rogue double-firing in the past (I have no idea how often
that happens).

If we do want to prevent double-triggerring, we should make sure that 2
schedulers aren't processing the same DAG or DagRun at the same time. That
would mean for the scheduler to not start the process of locked DAGs, and
by providing a mechanism to expire the locks after some time.

Has anyone experienced double firing lately? If that exist we should fix
it, but also be careful around multiple scheduler double-triggering as it
would make that problem potentially much worse.

Max

On Fri, Mar 1, 2019 at 8:19 AM Deng Xiaodong <xd.den...@gmail.com> wrote:

> It’s exactly what my team is doing & what I shared here earlier last year (
> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E
> <
> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E>
> )
>
> It’s somehow a “hacky” solution (and HA is not addressed), and now I’m
> thinking how we can have it more proper & robust.
>
>
> XD
>
> > On 2 Mar 2019, at 12:04 AM, Mario Urquizo <mario.urqu...@gmail.com>
> wrote:
> >
> > We have been running multiple schedulers for about 3 months.  We created
> > multiple services to run airflow schedulers.  The only difference is that
> > we have each of the schedulers pointed to a directory one level deeper
> than
> > the DAG home directory that the workers and webapp use. We have seen much
> > better scheduling performance but this does not yet help with HA.
> >
> > DAGS_HOME:
> > {airflow_home}/dags  (webapp & workers)
> > {airflow_home}/dags/group-a/ (scheduler1)
> > {airflow_home}/dags/group-b/ (scheduler2)
> > {airflow_home}/dags/group-etc/ (scheduler3)
> >
> > Not sure if this helps, just sharing in case it does.
> >
> > Thank you,
> > Mario
> >
> >
> > On Fri, Mar 1, 2019 at 9:44 AM Bolke de Bruin <bdbr...@gmail.com> wrote:
> >
> >> I have done quite some work on making it possible to run multiple
> >> schedulers at the same time.  At the moment I don’t think there are real
> >> blockers actually to do so. We just don’t actively test it.
> >>
> >> Database locking is mostly in place (DagRuns and TaskInstances). And I
> >> think the worst that can happen is that a task is scheduled twice. The
> task
> >> will detect this most of the time and kill one off if concurrent if not
> >> sequential then I will run again in some occasions. Everyone is having
> >> idempotent tasks right so no harm done? ;-)
> >>
> >> Have you encountered issues? Maybe work those out?
> >>
> >> Cheers
> >> Bolke.
> >>
> >> Verstuurd vanaf mijn iPad
> >>
> >>> Op 1 mrt. 2019 om 16:25 heeft Deng Xiaodong <xd.den...@gmail.com> het
> >> volgende geschreven:
> >>>
> >>> Hi Max,
> >>>
> >>> Following
> >>
> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E
> >> <
> >>
> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E
> >,
> >> I’m trying to prepare an AIP for supporting multiple-scheduler in
> Airflow
> >> (mainly for HA and Higher scheduling performance).
> >>>
> >>> Along the process of code checking, I found that there is one attribute
> >> of DagModel, “scheduler_lock”. It’s not used at all in current
> >> implementation, but it was introduced long time back (2015) to allow
> >> multiple schedulers to work together (
> >>
> https://github.com/apache/airflow/commit/2070bfc50b5aa038301519ef7c630f2fcb569620
> >> <
> >>
> https://github.com/apache/airflow/commit/2070bfc50b5aa038301519ef7c630f2fcb569620
> >
> >> ).
> >>>
> >>> Since you were the original author of it, it would be very helpful if
> >> you can kindly share why the multiple-schedulers implementation was
> removed
> >> eventually, and what challenges/complexity there were.
> >>> (You already shared a few valuable inputs in the earlier discussion
> >>
> https://lists.apache.org/thread.html/d37befd6f04dbdbfd2a2d41722352603bc2e2f97fb47bdc5ba454d0c@%3Cdev.airflow.apache.org%3E
> >> <
> >>
> https://lists.apache.org/thread.html/d37befd6f04dbdbfd2a2d41722352603bc2e2f97fb47bdc5ba454d0c@%3Cdev.airflow.apache.org%3E
> >
> >> , mainly relating to hiccups around concurrency, cross DAG
> prioritisation &
> >> load on DB. Other than these, anything else you would like to advise?)
> >>>
> >>> I will also dive into the git history further to understand it better.
> >>>
> >>> Thanks.
> >>>
> >>>
> >>> XD
> >>
>
>

Reply via email to