Haven't checked the math in the AIP but I believe with the given formula, with 5 schedulers and 100 DAGs there is already a 9% chance of conflict and the larger users of Airflow have many more DAGs than that.
I'm a bit concerned putting about putting more load on the DB which is already a scalability bottleneck. I agree with the sentiment in the AIP about using a more long-term solution like leader election (or consistent hashing with hash(dag_id) -> scheduler instance, etc), and the even more radical change would be pushing the scheduling logic to the workers themselves so scheduling becomes push-based instead of pull-based. The proposed change is probably better than doing nothing though in the short term, and I think one that shouldn't be too hard to reverse/change if done properly so I'm neutral overall. On Mon, Mar 16, 2020 at 6:12 PM Deng Xiaodong <[email protected]> wrote: > Would be happy to give +1 for this AIP later! > > > XD > > On Mon, Mar 16, 2020 at 11:08 PM Ash Berlin-Taylor <[email protected]> wrote: > > > Does anyone have any other opinions about this? If not I'd like to call a > > vote (and start working on the code!) > > > > -ash > > On Mar 3 2020, at 12:34 pm, Kaxil Naik <[email protected]> wrote: > > > The goal would be to support both MySQL and PostgreSQL for production > as > > we know many of Airflow users use MySQL as Metadata DB. On Tue, Mar 3, > 2020 > > at 12:25 PM Ash Berlin-Taylor wrote: > It _shouldn't_, and we will test > > extensively with mysql. > > Worse case is we'll have to fall back to > > managing the lock ourselves with > a column rather than relying on db/row > > level locks. This might be a case > where we have different/specialised > > behaviour for different dbs, or even db > versions, if say mysql 8 > behaves > > okay but 5.7/5.6 doesn't. > > Ash > > On 3 March 2020 07:01:15 GMT-05:00, > > "Kamil Breguła" < > [email protected]> wrote: > >Hello, > > > > > >Will reliance on the database cause problems with MySQL? A lot of my > > > >users use this database. I am afraid that the lock mechanism in MySQL > > > >is much less stable and predictable than PostgresSQL and this can > > >cause > > various stability problems. I know that Astronomer uses > >PostgreSQL, > but > > Airflow supports RDMS in a production environment and > >both must work > > properly in this AIP. > > > >Best regards, > >Kamil > > > >On Tue, Mar 3, > > 2020 at 12:50 PM Kaxil Naik wrote: > >> > >> Good work on the Proposal > Ash > > & Vikram. > >> > >> > >> > >> On Fri, Feb 28, 2020 at 10:39 PM Vikram > Koka > > > > > >> wrote: > >> > >> > Team, > >> > > >> > > >> > > >> > We just > > updated 'AIP-15 Support Multiple-Schedulers for HA & Better > >> > > > Scheduling Performance' on Confluence and would very much > >appreciate > > > >> > feedback and suggestions from the community. > >> > > >> > > >> > > > >> > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651 > > > >> > > >> > > >> > > >> > The original AIP was filed by Xiaodong Deng > on > > March 2nd, 2019 and > >has > >> > stalled after a while, so with his > > blessing, we are taking the > >baton on > >> > this AIP. We at Astronomer > > have heard several enterprises ask for > >both High > >> > Availability > as > > well as greater scalability, specifically around > >starting > >> > > > hundreds and thousands of tasks in a very short time window. > >> > > >> > > > > > >> > > >> > We would like to attempt this based on our experience > running > > > >Airflow as a > >> > Service and deploying Airflow at enterprises > around > > the globe. We > >believe > >> > that this will benefit Airflow and fuel > > greater adoption of Airflow > >for > >> > production pipelines within > > enterprises. > >> > > >> > > >> > > >> > Building on the original AIP, we > > have proposed an active/active > >model, > >> > where we can scale > > schedulers, but are staying away from the quorum > >> > approach. > Xiaodong > > Deng had put in some really good thinking about > >the > >> > problem > > including approaches towards reducing contention between > >multiple > > >> > > > schedulers and we have included some of those concepts here. > > >Additional > > > >> > commenters had discussed the possibilities of leader selection > and > > > >those > >> > challenges, and we have incorporated their thinking as > well. > > . > >> > > >> > > >> > > >> > Any feedback, suggestions, and comments > would > > be greatly > >appreciated. > >> > > >> > > >> > > >> > Best Regards, > > >> > > > > >> > > >> > Ash Berlin-Taylor and Vikram Koka > >> > > > > > > >
