Awesome. I wasn't aware of DagRun locking, this is even better!

Max

On Mon, May 22, 2017 at 11:39 PM, Bolke de Bruin <bdbr...@gmail.com> wrote:

> Hi Max,
>
> We seem to be in quite good order already. We are testing with multi
> master mysql and will also test multi master Postgres. As we are doing
> dagrun level locking already it does not seem to be required to do
> DAG-level locking. Also tasks are being locked so if multiple schedulers
> are running everything seems to be quite fine. If one of the schedulers
> restarts it starts checking for orphaned tasks by checking the executor
> queue which is unique for every scheduler. This will result it some tasks
> being dequeued and then requeued. So airflow is robust enough to stay alive
> then (with my patch for deadlocks applied), but some things are a bit
> sub-optimal.
>
> As mentioned we are still stress testing this setup and we might find more.
>
> Bolke
>
> > On 22 May 2017, at 18:19, Maxime Beauchemin <maximebeauche...@gmail.com>
> wrote:
> >
> > Things that might be needed for a correct multi-schedulers setup:
> > * DAG-level lock while being evaluated
> > * DAG-level lock expiration to recover from potential situation where the
> > lock wasn't released
> > * Accumulation of the list of task instances to run into the database (as
> > opposed to cross process communication to master process)
> > * Define a clear master cycle that would read the list of accumulated
> task
> > instances from the DB, dedup, prioritize and schedule. That master cycle
> > should have a lock (and lock expiration) as well.
> >
> > Max
> >
> > On Mon, May 22, 2017 at 12:27 AM, Bolke de Bruin <bdbr...@gmail.com>
> wrote:
> >
> >> Hi Stephen,
> >>
> >> We are currently stress testing Airflow for use in a multi-master setup.
> >> One of my team members is doing a write up that should show up online
> >> shortly. TL;DR; in its current state Airflow will need some patches in
> >> order to run concurrently. One issue is that Airflow can have a database
> >> deadlock which will stop the scheduler from running. I have a patch for
> >> that out here (https://github.com/apache/incubator-airflow/pull/2267 <
> >> https://github.com/apache/incubator-airflow/pull/2267>) that works fine
> >> on Postgres/MySql (tests don’t pass on sqlite yet due to limitations of
> >> sqlite).
> >>
> >> Your global scheduler lock (eg. by an active passive configuration)
> might
> >> make most sense for now.
> >>
> >> Bolke
> >>
> >>> On 22 May 2017, at 07:52, Stephen Rigney <sjrig...@gmail.com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> We're running airflow in production, but for reliability (n.b. not
> >>> performance) we'd like to confirm if it is safe to spawn multiple
> >> instances
> >>> of the scheduler overlapping in time (otherwise we may need to put more
> >>> effort into assuring two copies aren't ever spawned at once in our
> >>> environment).
> >>>
> >>>
> >>> It seems this officially wasn't a supported configuration back in 2015
> (
> >>> https://groups.google.com/d/msg/airbnb_airflow/-
> 1wKa3OcwME/uATa8y3YDAAJ
> >> ),
> >>> but has sufficient intra-airflow locking been added that it is now safe
> >> to
> >>> start up two temporally overlapping instances of the scheduler for the
> >> same
> >>> airflow system?
> >>>
> >>>
> >>> Or should we hack in a "global scheduler lock" - we're not looking for
> >>> increased performance by scheduler parallelism, just that if we ever
> fire
> >>> up two instances of the scheduler nothing terrible happens?
> >>>
> >>>
> >>> Stephen
> >>
> >>
>
>

Reply via email to