Let me separate it to a different thread because yes - you are right Daniel - it should not be only AIP-78.
> As I see it, minischeduler is unrelated and not what this AIP is about, and > basically does not matter for this AIP. And the broader discussion about > locking, while interesting, is also beyond this AIP and I'm not sure it > makes sense to have here. Side note, there is currently locking in more > than one place in airflow, and of more than one type of entity; 23 usages > in core of with_row_locks helper. Some in dag processor; some, I assume, > required for some API interactions. > Yes it's only somewhat related I agree. It just came to me now, given that we are bringing "more work" to the scheduler. In ideal world the discussion should be in the "Scheduler Performance Improvements" workstream marked as "A joint effort among various Stakeholders" in the current workstream list, and maybe that means that we should promote it and get it added as Airflow 3.0 workstream. I think there will be several changes impacting how scheduler operates - and it will come from multiple AIPs, so maybe having such a deliberate "Infrastructure scheduler stream" is not a bad thing. Don't read that as me not wanting to provide details, I just want to be > clear about what is actually relevant for this AIP and what is really a > separate discussion and also a hope that, for the in-scope things, that we > can just be clear about specifically what are the concerns that need to be > addressed and the questions that need to be answered. > Yes I do not think I am asking You to provide details in that AIP - more to raise awareness of it. And since it **just** occured to me while reading the proposal I happened to raise it here. If anything I think both AIP-72 and AIP-78 (and possibly even AIP-67 multi-team and AIP-66 DAG versioning and maybe others) should provide an input to the "infrastructure" AIP about the scheduler - where we could design and document some of the internal changes there. There will be changes resulting from all these related to scheduling, It would be great if we agree and document the resulting scheduler design this time and discuss how it works with all these. > Backfill, as I understand it, is fundamentally about creating dag runs. > From the scheduling perspective, it's not much different from "normal" dag > runs -- it's just that they are created with old execution dates. Once the > runs are created, my thinking is, the task instances of backfill runs can > be processed in the same way as non-backfill runs. This is what I > propose. Do we have agreement on this? > Yes. but there will be separate tables locking on DagRun (because backfill will have separate tables and it can cause new deadlocks if some of the DAG RUN is locked why Backfill "create dagrun" process creates DagRuns for those. It would be great to at least understand - if it's going to be separate process, or same thread - different part of the scheduler loop and how locking of DAG Run and define how deadlock avoiding will look like. And yes - if we do no know it now, we can park it for later, make a note and - for example - defer it to the "scheduler performance" workstream - which does not have to be even an AIP - it could just be a design doc and people discussing it will make sure that the design is laid out and documented and updated during implementation. But we should - I think make sure that we have a place and discussion where this can be iterated on for Airflow 3 development. > > Regarding how the dag runs are created, there's a class method on the DAG > object `dags_needing_dagruns`. This is currently where we identify the > dags needing scheduled runs as well as dags needing dataset triggered > runs. And the the dag table is locked as part of this. I suspect this > would be a reasonable place to identify dags needing backfill runs as > well. Does that sound reasonable to you? > Maybe - I am less for a code and more who is doing it. Is it the same scheduler loop that is run today? How do we avoid starvation if so or conversely avoid deadlocks if it's another process? This is basically what concerns me - that we either push "more" to the loop or we create "another loop" and I think it's a good idea to design it. But yes - it could be implementation discussion as long as it is happening before a lot of effort is put in the implementation. > What other open questions are there re scheduler? > Currently I do not have more. But I am sure when we go through other AIPs in more detail they might (again) show up. Scheduler is a center-piece in Airflow, and the changes we implement for Airflow 3 might significantly change some of the ways scheduler works. J