Let me separate it to a different thread because yes - you are right Daniel
-
it should not be only AIP-78.


> As I see it, minischeduler is unrelated and not what this AIP is about, and
> basically does not matter for this AIP.  And the broader discussion about
> locking, while interesting, is also beyond this AIP and I'm not sure it
> makes sense to have here.  Side note, there is currently locking in more
> than one place in airflow, and of more than one type of entity; 23 usages
> in core of with_row_locks helper.  Some in dag processor; some, I assume,
> required for some API interactions.
>

Yes it's only somewhat related I agree. It just came to me now, given that
we
are bringing "more work" to the scheduler. In ideal world the discussion
should
be in the "Scheduler Performance Improvements" workstream marked as
"A joint effort among various Stakeholders" in the current workstream list,
and maybe that means that we should promote it and get it added as
Airflow 3.0 workstream. I think there will be several changes impacting how
scheduler operates - and it will come from multiple AIPs, so maybe having
such a deliberate "Infrastructure scheduler stream" is not a bad thing.

Don't read that as me not wanting to provide details, I just want to be
> clear about what is actually relevant for this AIP and what is really a
> separate discussion and also a hope that, for the in-scope things, that we
> can just be clear about specifically what are the concerns that need to be
> addressed and the questions that need to be answered.
>

Yes I do not think I am asking You to provide details in that AIP - more to
raise awareness of it. And since it **just** occured to me while reading
the proposal
I happened to raise it here. If anything I think both AIP-72 and AIP-78
(and possibly
even AIP-67 multi-team and AIP-66 DAG versioning and maybe others)
should provide an input to the "infrastructure" AIP about the scheduler -
where we could design and document some of the internal changes there.

There will be changes resulting from all these related to scheduling, It
would be
great if we agree and document the resulting scheduler design this time and
discuss how it works with all these.


> Backfill, as I understand it, is fundamentally about creating dag runs.
> From the scheduling perspective, it's not much different from "normal" dag
> runs -- it's just that they are created with old execution dates.  Once the
> runs are created, my thinking is, the task instances of backfill runs can
> be processed in the same way as non-backfill runs.  This is what I
> propose.  Do we have agreement on this?
>

Yes. but there will be separate tables locking on DagRun (because backfill
will have separate tables and it can cause new deadlocks if some of the DAG
RUN
is locked why Backfill "create dagrun" process creates DagRuns for those.
It would
be great to at least understand - if it's going to be separate process, or
same thread
- different part of the scheduler loop and how locking of DAG Run and
define how
deadlock avoiding will look like. And yes - if we do no know it now, we can
park it
for later, make a note and - for example - defer it to the "scheduler
performance"
workstream - which does not have to be even an AIP - it could just be a
design doc
and people discussing it will make sure that the design is laid out and
documented
and updated during implementation. But we should - I think make sure that we
have a place and discussion where this can be iterated on for Airflow 3
development.


>
> Regarding how the dag runs are created, there's a class method on the DAG
> object `dags_needing_dagruns`.  This is currently where we identify the
> dags needing scheduled runs as well as dags needing dataset triggered
> runs.  And the the dag table is locked as part of this.  I suspect this
> would be a reasonable place to identify dags needing backfill runs as
> well.  Does that sound reasonable to you?
>

Maybe - I am less for a code and more who is doing it. Is it the same
scheduler
loop that is run today? How do we avoid starvation if so or conversely avoid
deadlocks if it's another process?  This is basically what concerns me -
that we
either push "more" to the loop or we create "another loop" and I think it's
a good
idea to design it. But yes - it could be implementation discussion as long
as it
is happening before a lot of effort is put in the implementation.


> What other open questions are there re scheduler?
>

Currently I do not have more. But I am sure when we go through other AIPs
in more detail they might (again) show up. Scheduler is a center-piece in
Airflow, and the changes we implement
for Airflow 3 might significantly change some of the ways scheduler works.

J

Reply via email to