Forgot to mention: the intention was to use the lock, but I never
personally got to do the second phase which would consist of skipping the
DAG if the lock is on, and expire the lock eventually based on a config
setting.

Max

On Fri, Mar 1, 2019 at 1:57 PM Maxime Beauchemin <maximebeauche...@gmail.com>
wrote:

> My original intention with the lock was preventing "double-triggering" of
> task (triggering refers to the scheduler putting the message in the queue).
> Airflow now has good "double-firing-prevention" of tasks (firing happens
> when the worker receives the message and starts the task), even if the
> scheduler was to go rogue or restart and send multiple triggers for a task
> instance, the worker(s) should only start one task instance. That's done by
> running the database assertions behind the conditions being met as read
> database transaction (no task can alter the rows that validate the
> assertion while it's getting asserted). In practice it's a little tricky
> and we've seen rogue double-firing in the past (I have no idea how often
> that happens).
>
> If we do want to prevent double-triggerring, we should make sure that 2
> schedulers aren't processing the same DAG or DagRun at the same time. That
> would mean for the scheduler to not start the process of locked DAGs, and
> by providing a mechanism to expire the locks after some time.
>
> Has anyone experienced double firing lately? If that exist we should fix
> it, but also be careful around multiple scheduler double-triggering as it
> would make that problem potentially much worse.
>
> Max
>
> On Fri, Mar 1, 2019 at 8:19 AM Deng Xiaodong <xd.den...@gmail.com> wrote:
>
>> It’s exactly what my team is doing & what I shared here earlier last year
>> (
>> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E
>> <
>> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E>
>> )
>>
>> It’s somehow a “hacky” solution (and HA is not addressed), and now I’m
>> thinking how we can have it more proper & robust.
>>
>>
>> XD
>>
>> > On 2 Mar 2019, at 12:04 AM, Mario Urquizo <mario.urqu...@gmail.com>
>> wrote:
>> >
>> > We have been running multiple schedulers for about 3 months.  We created
>> > multiple services to run airflow schedulers.  The only difference is
>> that
>> > we have each of the schedulers pointed to a directory one level deeper
>> than
>> > the DAG home directory that the workers and webapp use. We have seen
>> much
>> > better scheduling performance but this does not yet help with HA.
>> >
>> > DAGS_HOME:
>> > {airflow_home}/dags  (webapp & workers)
>> > {airflow_home}/dags/group-a/ (scheduler1)
>> > {airflow_home}/dags/group-b/ (scheduler2)
>> > {airflow_home}/dags/group-etc/ (scheduler3)
>> >
>> > Not sure if this helps, just sharing in case it does.
>> >
>> > Thank you,
>> > Mario
>> >
>> >
>> > On Fri, Mar 1, 2019 at 9:44 AM Bolke de Bruin <bdbr...@gmail.com>
>> wrote:
>> >
>> >> I have done quite some work on making it possible to run multiple
>> >> schedulers at the same time.  At the moment I don’t think there are
>> real
>> >> blockers actually to do so. We just don’t actively test it.
>> >>
>> >> Database locking is mostly in place (DagRuns and TaskInstances). And I
>> >> think the worst that can happen is that a task is scheduled twice. The
>> task
>> >> will detect this most of the time and kill one off if concurrent if not
>> >> sequential then I will run again in some occasions. Everyone is having
>> >> idempotent tasks right so no harm done? ;-)
>> >>
>> >> Have you encountered issues? Maybe work those out?
>> >>
>> >> Cheers
>> >> Bolke.
>> >>
>> >> Verstuurd vanaf mijn iPad
>> >>
>> >>> Op 1 mrt. 2019 om 16:25 heeft Deng Xiaodong <xd.den...@gmail.com> het
>> >> volgende geschreven:
>> >>>
>> >>> Hi Max,
>> >>>
>> >>> Following
>> >>
>> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E
>> >> <
>> >>
>> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E
>> >,
>> >> I’m trying to prepare an AIP for supporting multiple-scheduler in
>> Airflow
>> >> (mainly for HA and Higher scheduling performance).
>> >>>
>> >>> Along the process of code checking, I found that there is one
>> attribute
>> >> of DagModel, “scheduler_lock”. It’s not used at all in current
>> >> implementation, but it was introduced long time back (2015) to allow
>> >> multiple schedulers to work together (
>> >>
>> https://github.com/apache/airflow/commit/2070bfc50b5aa038301519ef7c630f2fcb569620
>> >> <
>> >>
>> https://github.com/apache/airflow/commit/2070bfc50b5aa038301519ef7c630f2fcb569620
>> >
>> >> ).
>> >>>
>> >>> Since you were the original author of it, it would be very helpful if
>> >> you can kindly share why the multiple-schedulers implementation was
>> removed
>> >> eventually, and what challenges/complexity there were.
>> >>> (You already shared a few valuable inputs in the earlier discussion
>> >>
>> https://lists.apache.org/thread.html/d37befd6f04dbdbfd2a2d41722352603bc2e2f97fb47bdc5ba454d0c@%3Cdev.airflow.apache.org%3E
>> >> <
>> >>
>> https://lists.apache.org/thread.html/d37befd6f04dbdbfd2a2d41722352603bc2e2f97fb47bdc5ba454d0c@%3Cdev.airflow.apache.org%3E
>> >
>> >> , mainly relating to hiccups around concurrency, cross DAG
>> prioritisation &
>> >> load on DB. Other than these, anything else you would like to advise?)
>> >>>
>> >>> I will also dive into the git history further to understand it better.
>> >>>
>> >>> Thanks.
>> >>>
>> >>>
>> >>> XD
>> >>
>>
>>

Reply via email to