Haven't checked the math in the AIP but I believe with the given formula,
with 5 schedulers and 100 DAGs there is already a 9% chance of conflict and
the larger users of Airflow have many more DAGs than that.

I'm a bit concerned putting about putting more load on the DB which is
already a scalability bottleneck. I agree with the sentiment in the AIP
about using a more long-term solution like leader election (or consistent
hashing with hash(dag_id) -> scheduler instance, etc), and the even more
radical change would be pushing the scheduling logic to the workers
themselves so scheduling becomes push-based instead of pull-based. The
proposed change is probably better than doing nothing though in the short
term, and I think one that shouldn't be too hard to reverse/change if done
properly so I'm neutral overall.

On Mon, Mar 16, 2020 at 6:12 PM Deng Xiaodong <[email protected]> wrote:

> Would be happy to give +1 for this AIP later!
>
>
> XD
>
> On Mon, Mar 16, 2020 at 11:08 PM Ash Berlin-Taylor <[email protected]> wrote:
>
> > Does anyone have any other opinions about this? If not I'd like to call a
> > vote (and start working on the code!)
> >
> > -ash
> > On Mar 3 2020, at 12:34 pm, Kaxil Naik <[email protected]> wrote:
> > > The goal would be to support both MySQL and PostgreSQL for production
> as
> > we know many of Airflow users use MySQL as Metadata DB. On Tue, Mar 3,
> 2020
> > at 12:25 PM Ash Berlin-Taylor wrote: > It _shouldn't_, and we will test
> > extensively with mysql. > > Worse case is we'll have to fall back to
> > managing the lock ourselves with > a column rather than relying on db/row
> > level locks. This might be a case > where we have different/specialised
> > behaviour for different dbs, or even db > versions, if say mysql 8
> behaves
> > okay but 5.7/5.6 doesn't. > > Ash > > On 3 March 2020 07:01:15 GMT-05:00,
> > "Kamil Breguła" < > [email protected]> wrote: > >Hello, > > >
> > >Will reliance on the database cause problems with MySQL? A lot of my >
> > >users use this database. I am afraid that the lock mechanism in MySQL >
> > >is much less stable and predictable than PostgresSQL and this can >
> >cause
> > various stability problems. I know that Astronomer uses > >PostgreSQL,
> but
> > Airflow supports RDMS in a production environment and > >both must work
> > properly in this AIP. > > > >Best regards, > >Kamil > > > >On Tue, Mar 3,
> > 2020 at 12:50 PM Kaxil Naik wrote: > >> > >> Good work on the Proposal
> Ash
> > & Vikram. > >> > >> > >> > >> On Fri, Feb 28, 2020 at 10:39 PM Vikram
> Koka
> > > > > >> wrote: > >> > >> > Team, > >> > > >> > > >> > > >> > We just
> > updated 'AIP-15 Support Multiple-Schedulers for HA & Better > >> >
> > Scheduling Performance' on Confluence and would very much > >appreciate >
> > >> > feedback and suggestions from the community. > >> > > >> > > >> > >
> >>
> > > > > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651
> > > >> > > >> > > >> > > >> > The original AIP was filed by Xiaodong Deng
> on
> > March 2nd, 2019 and > >has > >> > stalled after a while, so with his
> > blessing, we are taking the > >baton on > >> > this AIP. We at Astronomer
> > have heard several enterprises ask for > >both High > >> > Availability
> as
> > well as greater scalability, specifically around > >starting > >> >
> > hundreds and thousands of tasks in a very short time window. > >> > > >>
> >
> > > >> > > >> > We would like to attempt this based on our experience
> running
> > > >Airflow as a > >> > Service and deploying Airflow at enterprises
> around
> > the globe. We > >believe > >> > that this will benefit Airflow and fuel
> > greater adoption of Airflow > >for > >> > production pipelines within
> > enterprises. > >> > > >> > > >> > > >> > Building on the original AIP, we
> > have proposed an active/active > >model, > >> > where we can scale
> > schedulers, but are staying away from the quorum > >> > approach.
> Xiaodong
> > Deng had put in some really good thinking about > >the > >> > problem
> > including approaches towards reducing contention between > >multiple >
> >> >
> > schedulers and we have included some of those concepts here. >
> >Additional
> > > >> > commenters had discussed the possibilities of leader selection
> and >
> > >those > >> > challenges, and we have incorporated their thinking as
> well.
> > . > >> > > >> > > >> > > >> > Any feedback, suggestions, and comments
> would
> > be greatly > >appreciated. > >> > > >> > > >> > > >> > Best Regards, >
> >> >
> > > >> > > >> > Ash Berlin-Taylor and Vikram Koka > >> > >
> >
> >
>

Reply via email to