I did run into "double SLA miss alarms" firing, but that was on 1.7x. I
haven't tested if that is still an issue in 1.8x.

-s

On Tue, May 23, 2017 at 8:46 AM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Awesome. I wasn't aware of DagRun locking, this is even better!
>
> Max
>
> On Mon, May 22, 2017 at 11:39 PM, Bolke de Bruin <bdbr...@gmail.com>
> wrote:
>
> > Hi Max,
> >
> > We seem to be in quite good order already. We are testing with multi
> > master mysql and will also test multi master Postgres. As we are doing
> > dagrun level locking already it does not seem to be required to do
> > DAG-level locking. Also tasks are being locked so if multiple schedulers
> > are running everything seems to be quite fine. If one of the schedulers
> > restarts it starts checking for orphaned tasks by checking the executor
> > queue which is unique for every scheduler. This will result it some tasks
> > being dequeued and then requeued. So airflow is robust enough to stay
> alive
> > then (with my patch for deadlocks applied), but some things are a bit
> > sub-optimal.
> >
> > As mentioned we are still stress testing this setup and we might find
> more.
> >
> > Bolke
> >
> > > On 22 May 2017, at 18:19, Maxime Beauchemin <
> maximebeauche...@gmail.com>
> > wrote:
> > >
> > > Things that might be needed for a correct multi-schedulers setup:
> > > * DAG-level lock while being evaluated
> > > * DAG-level lock expiration to recover from potential situation where
> the
> > > lock wasn't released
> > > * Accumulation of the list of task instances to run into the database
> (as
> > > opposed to cross process communication to master process)
> > > * Define a clear master cycle that would read the list of accumulated
> > task
> > > instances from the DB, dedup, prioritize and schedule. That master
> cycle
> > > should have a lock (and lock expiration) as well.
> > >
> > > Max
> > >
> > > On Mon, May 22, 2017 at 12:27 AM, Bolke de Bruin <bdbr...@gmail.com>
> > wrote:
> > >
> > >> Hi Stephen,
> > >>
> > >> We are currently stress testing Airflow for use in a multi-master
> setup.
> > >> One of my team members is doing a write up that should show up online
> > >> shortly. TL;DR; in its current state Airflow will need some patches in
> > >> order to run concurrently. One issue is that Airflow can have a
> database
> > >> deadlock which will stop the scheduler from running. I have a patch
> for
> > >> that out here (https://github.com/apache/incubator-airflow/pull/2267
> <
> > >> https://github.com/apache/incubator-airflow/pull/2267>) that works
> fine
> > >> on Postgres/MySql (tests don’t pass on sqlite yet due to limitations
> of
> > >> sqlite).
> > >>
> > >> Your global scheduler lock (eg. by an active passive configuration)
> > might
> > >> make most sense for now.
> > >>
> > >> Bolke
> > >>
> > >>> On 22 May 2017, at 07:52, Stephen Rigney <sjrig...@gmail.com> wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> We're running airflow in production, but for reliability (n.b. not
> > >>> performance) we'd like to confirm if it is safe to spawn multiple
> > >> instances
> > >>> of the scheduler overlapping in time (otherwise we may need to put
> more
> > >>> effort into assuring two copies aren't ever spawned at once in our
> > >>> environment).
> > >>>
> > >>>
> > >>> It seems this officially wasn't a supported configuration back in
> 2015
> > (
> > >>> https://groups.google.com/d/msg/airbnb_airflow/-
> > 1wKa3OcwME/uATa8y3YDAAJ
> > >> ),
> > >>> but has sufficient intra-airflow locking been added that it is now
> safe
> > >> to
> > >>> start up two temporally overlapping instances of the scheduler for
> the
> > >> same
> > >>> airflow system?
> > >>>
> > >>>
> > >>> Or should we hack in a "global scheduler lock" - we're not looking
> for
> > >>> increased performance by scheduler parallelism, just that if we ever
> > fire
> > >>> up two instances of the scheduler nothing terrible happens?
> > >>>
> > >>>
> > >>> Stephen
> > >>
> > >>
> >
> >
>

Reply via email to