I don't think that "conflict" isn't strictly a problem, the second (or n-th) scheduler that tries to work on the locked dag will simply move on to the next one.
Looking at "fast-follow"/moving some of the scheduling decisions to the workers is already on my todo list, but the other thing to consider there is that we _might_ want to remove direct database access from workers in the future so that we can limit what Connections a DAG/task has access to. But we can address that in when we come to it, achieving both of these should be possible still. No complaints, I'll start the vote. -ash On Mon Mar 16, 2020 at 6:27 PM, Dan Davydov wrote: > Haven't checked the math in the AIP but I believe with the given > formula, > with 5 schedulers and 100 DAGs there is already a 9% chance of conflict > and > the larger users of Airflow have many more DAGs than that. > > > I'm a bit concerned putting about putting more load on the DB which is > already a scalability bottleneck. I agree with the sentiment in the AIP > about using a more long-term solution like leader election (or > consistent > hashing with hash(dag_id) -> scheduler instance, etc), and the even more > radical change would be pushing the scheduling logic to the workers > themselves so scheduling becomes push-based instead of pull-based. The > proposed change is probably better than doing nothing though in the > short > term, and I think one that shouldn't be too hard to reverse/change if > done > properly so I'm neutral overall. > > > On Mon, Mar 16, 2020 at 6:12 PM Deng Xiaodong <xd.den...@gmail.com> > wrote: > > > > Would be happy to give +1 for this AIP later! > > > > > > XD > > > > On Mon, Mar 16, 2020 at 11:08 PM Ash Berlin-Taylor <a...@apache.org> wrote= > : > > > > > Does anyone have any other opinions about this? If not I'd like to call= > a > > > vote (and start working on the code!) > > > > > > -ash > > > On Mar 3 2020, at 12:34 pm, Kaxil Naik <kaxiln...@gmail.com> wrote: > > > > The goal would be to support both MySQL and PostgreSQL for production > > as > > > we know many of Airflow users use MySQL as Metadata DB. On Tue, Mar 3, > > 2020 > > > at 12:25 PM Ash Berlin-Taylor wrote: > It _shouldn't_, and we will test > > > extensively with mysql. > > Worse case is we'll have to fall back to > > > managing the lock ourselves with > a column rather than relying on db/r= > ow > > > level locks. This might be a case > where we have different/specialised > > > behaviour for different dbs, or even db > versions, if say mysql 8 > > behaves > > > okay but 5.7/5.6 doesn't. > > Ash > > On 3 March 2020 07:01:15 GMT-05:0= > 0, > > > "Kamil Bregu=C5=82a" < > kamil.breg...@polidea.com> wrote: > >Hello, > = > > > > > > >Will reliance on the database cause problems with MySQL? A lot of my > > > > >users use this database. I am afraid that the lock mechanism in MySQL = > > > > > >is much less stable and predictable than PostgresSQL and this can > > > >cause > > > various stability problems. I know that Astronomer uses > >PostgreSQL, > > but > > > Airflow supports RDMS in a production environment and > >both must work > > > properly in this AIP. > > > >Best regards, > >Kamil > > > >On Tue, Mar = > 3, > > > 2020 at 12:50 PM Kaxil Naik wrote: > >> > >> Good work on the Proposal > > Ash > > > & Vikram. > >> > >> > >> > >> On Fri, Feb 28, 2020 at 10:39 PM Vikram > > Koka > > > > > > >> wrote: > >> > >> > Team, > >> > > >> > > >> > > >> > We just > > > updated 'AIP-15 Support Multiple-Schedulers for HA & Better > >> > > > > Scheduling Performance' on Confluence and would very much > >appreciate= > > > > > >> > feedback and suggestions from the community. > >> > > >> > > >> > = > > > > >> > > > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=3D103092= > 651 > > > > >> > > >> > > >> > > >> > The original AIP was filed by Xiaodong Deng > > on > > > March 2nd, 2019 and > >has > >> > stalled after a while, so with his > > > blessing, we are taking the > >baton on > >> > this AIP. We at Astronom= > er > > > have heard several enterprises ask for > >both High > >> > Availability > > as > > > well as greater scalability, specifically around > >starting > >> > > > > hundreds and thousands of tasks in a very short time window. > >> > > >= > > > > > > > > > >> > > >> > We would like to attempt this based on our experience > > running > > > > >Airflow as a > >> > Service and deploying Airflow at enterprises > > around > > > the globe. We > >believe > >> > that this will benefit Airflow and fuel > > > greater adoption of Airflow > >for > >> > production pipelines within > > > enterprises. > >> > > >> > > >> > > >> > Building on the original AIP, = > we > > > have proposed an active/active > >model, > >> > where we can scale > > > schedulers, but are staying away from the quorum > >> > approach. > > Xiaodong > > > Deng had put in some really good thinking about > >the > >> > problem > > > including approaches towards reducing contention between > >multiple > > > >> > > > > schedulers and we have included some of those concepts here. > > > >Additional > > > > >> > commenters had discussed the possibilities of leader selection > > and > > > > >those > >> > challenges, and we have incorporated their thinking as > > well. > > > . > >> > > >> > > >> > > >> > Any feedback, suggestions, and comments > > would > > > be greatly > >appreciated. > >> > > >> > > >> > > >> > Best Regards, > > > >> > > > > > >> > > >> > Ash Berlin-Taylor and Vikram Koka > >> > > > > > > > > > > > > > >