Re: [Discussion] Make the scheduler's task selection algorithm pluggable

Daniel Standish via dev Tue, 16 Sep 2025 13:51:14 -0700

I agree with Jarek's comments.

Re some of your comments...


Scheduler can be improved.  Scheduler should be improved.  Incremental is
good.

Nobody is going to dispute that breaking things up into components that are
easier to understand and maintain is a good goal.

Nobody is going to object to running more stuff in parallel, if it is
feasible and produces better performance, is a good thing.

Re your specific PR

I think that your idea to push the logic down to stored procs though is a
tougher sell.  For one, you need to add a different sproc per dialect.  And
the length of the one sproc you wrote is about the same as the long
function that you are trying to replace.  SQL is also a worse language for
intelligibility and modularization so it goes in the opposite direction.
Your sproc is tougher to inspect than the function it replaces IMHO.

We obviously cannot write off the idea of using sprocs and pushing more
logic to sql.  But there are real tradeoffs and the case has to be made
that it's worth it.  The sproc approach does appear to be worse to
maintain, tougher to test, and more error prone.  And to make changes
requires a deployment  --- not just patching code.  So, while a case could
be made that it is worth it, the case has to be made.


On Tue, Sep 16, 2025 at 10:33 AM Jarek Potiuk <[email protected]> wrote:

> It's just the very basic thing here is 'community over code' - wy believe
> that good code is result of building great community that works together.
> And that the opposite is not necessarily true.
>
> In this case I identify the issue not as 'code is bad' but as `let's build
> community of people collaborating on making it better' - where careful
> refactoring with participation od multiple parties and especially building
> on all the experience and accumulated years of wisdom of people who worked
> on it is crucial. Because individuals who have even best intentions to make
> things 'better' might just miss a lot of context (like for example the
> open-source context where we have to be careful what we expose to users -
> lessons we learned hard over the many years Airflow is maintained by us.
>
> And it is simply rather closing for cooperation rather than opening if we
> start with goal that is too ambitious and starts with "let's rip it all out
> it's badly designed" vibe to it.
>
> I think it's better to see the modularity / customisability as potentially
> good end goal and start with incremental improvements that will also
> address the imperfect explanation of context and decisions and get people
> enthused and collaborating on it.
>
> J.
>
> wt., 16 wrz 2025, 11:13 użytkownik asquator <[email protected]> napisał:
>
> > Well, it's always new contributors who criticize the old code, because
> > they're affected the most.
> > It's not about blaming people, but noticing things "at the margin",
> > ignoring the history and proposing
> > changes that are objectively beneficial *at the moment*. I did watch
> Ash's
> > talk before starting to work
> > on the starvation issue, and it's great. I have nothing to say about the
> > algorithm and the logic
> > (mb except it should be asynchronous), but the code structure. Over the
> > years, small patches have been added
> > to one file, and now it has exploded. It's normal. It happens. We just
> > have to realize that and start
> > splitting the code, without drastically changing the logic.
> >
> > Practice shows that modular code is always better than monolithic scripts
> > with multiple responsibilities.
> > I don't say abstract every small part of the code. The "cheapest" way to
> > decide which parts should
> > be abstracted is the simple principle of "if the component has to be
> > replaced once, it can be replaced again in the future".
> > As we're working on the critical section, we're ready to do the extra
> work
> > to abstract it out and make it
> > pluggable (for developers) - it can be the first step in modularizing the
> > scheduler which we're ready to take.
> >
> > As I have to implement the PR, I'm willing to dive into the tactical
> level
> > and reach a compromise with the PMCs
> > (who know this project's internals much better) regarding what's the best
> > option to inject things internally.
> > It's a purely technical problem I have to solve to have a working PR. I
> > *will* do the subclass for testing because
> > it's quick and dirty, but it would be nice to come to a better solution
> > (it shouldn't be too hard) so we can
> > do it the best way before it's (hopefully) merged.
> >
> >
> >
> > On Tuesday, September 16th, 2025 at 7:08 PM, Jarek Potiuk <
> > [email protected]> wrote:
> >
> > > I think this is something we should seriously discuss after 3.1 as many
> > > people are busy now with tying up loose ends.
> > >
> > > But I would be generally in favour of looking at Scheduler logic,
> > > modularising it and making it easier to reason about. I do not want to
> > > (absolutely) say that it's bad or "should have been done better" or
> > > anything to say always modular code is best (which often might be
> implied
> > > as "someone did a bad job here"). This is absolutely not that. It's
> very
> > > easy to criticise things (even subconsciously) when you come out from
> > > outside and you see how things "should have been" - but without all the
> > > context and history, this is often a bad messaging to those who spent
> > years
> > > on making thing work reliably in production for tens of thousands of
> > users
> > > and being generally stable and one of the most reliable part of Airflow
> > > "core".
> > >
> > > There are PLENTY of reasons for scheduler being implemented the way it
> > is -
> > > and even trying to approach explaining the history and decision making
> > > process would take a lot of time. And currently (for good reasons) the
> > > scheduler "API" is exactly what Ash explained.
> > >
> > > But there is no reason we should not think and make a concerted (and
> > group)
> > > effort to modularise it and make scheduler easier to reason about and -
> > > importantly - being easier for more people to contribute to and discuss
> > it
> > > - and most importantly - adding way more modularised tests that would
> > also
> > > allow us to break it up and tests parts of it - also for performance
> and
> > > behaviour characteristics. It's not well suited for it today, but -
> > > possibly - it could. And as the most important part of Airflow, we
> > > should make it easier to understand, reason about and contribute to by
> > many
> > > people. Simply now there are probably just a few people (Ash being the
> > main
> > > person) that can reason and discuss some intrinsic scheduler insights.
> > And
> > > If we want to make Airflow sustainable, we should make it easier for
> > others
> > > to understand and contribute to it.
> > >
> > > One of the things as a result of it - I would love to have it better
> > > documented and explained the reasoning behind some decisions and
> > explaining
> > > how it works (that might be a result of such a concerted effort).
> > >
> > > It's a similar story as with Ci/CD Breeze two years ago - I was the
> only
> > > one who really could reason about it but through rewriting in Python
> > > and documenting with ADRs
> > > https://github.com/apache/airflow/tree/main/dev/breeze/doc/adr which
> > still
> > > describe some basic assumptions there and engaging others, modularising
> > > stuff and getting them to participate I can go now for 3 weeks
> vacations
> > > knowing that things will be taken care of, no matter what (Which BTW. I
> > am
> > > doing now).
> > >
> > > The "why" and "how" scheduler works is not really documented. There is
> > this
> > > fantastic talk by Ash https://www.youtube.com/watch?v=DYC4-xElccE
> which
> > > still holds and explains it, but I would love to be able to reason and
> > > discuss more about it - looking at both code and docs - without
> > > reverse-engineering stuff.
> > >
> > > But I think the goal should be "modularising first" - maybe resulting
> > > later in easier way of replacing pieces of scheduler, so the
> modularising
> > > effort should be guided by the current PRs and ways they are trying to
> > > address starvation for example. Doing it slowly, with mutliple people
> > > reviewing, learning and contributing (and documentation created on the
> > way).
> > >
> > > I think that should be our initial goal ... then maybe things will
> > > follow.
> > >
> > > J.
> > >
> > >
> > >
> > > On Tue, Sep 16, 2025 at 8:02 AM asquator [email protected] wrote:
> > >
> > > > will be undocumented*
> > > >
> > > > On Tuesday, September 16th, 2025 at 5:01 PM, asquator
> > [email protected]
> > > > wrote:
> > > >
> > > > > I see the motivation, but does it have to look so bad?
> > > > >
> > > > > The subclass will look like this:
> > > > >
> > > > > class SchedulerJobRunnerLinearTIScan(SchedulerJobRunner):
> > > > > def init(
> > > > > self,
> > > > > job: Job,
> > > > > num_runs: int = conf.getint("scheduler", "num_runs"),
> > > > > scheduler_idle_sleep_time: float = conf.getfloat("scheduler",
> > > > > "scheduler_idle_sleep_time"),
> > > > > log: Logger | None = None,
> > > > > ):
> > > > > super().init(
> > > > > job=job,
> > > > > num_runs=num_runs,
> > > > > scheduler_idle_sleep_time=scheduler_idle_sleep_time,
> > > > > log=log,
> > > > > )
> > > > > self.task_selector = TASK_SELECTORS[LINEAR_SCAN_SELECTOR]
> > > > >
> > > > > The super class will use the injected hard-coded task selector.
> > > > >
> > > > > Can't we introduce a configuration hierarchy like `core.internal`
> and
> > > > > put there things not exposed to the end user? So we don't have to
> do
> > this
> > > > > weird subclassing?
> > > > >
> > > > > It will look thus:
> > > > >
> > > > > class SchedulerJobRunner(...):
> > > > > task_selector_type = conf.get("scheduler.internal",
> > > > > "task_selector_strategy")
> > > > > self.task_selector = TASK_SELECTORS[task_selector_type]
> > > > >
> > > > > We'd just like to have an internal toggle as an implementation
> > detail,
> > > > > which won't be undocumented and custom implementations won't be
> > supported.
> > > > > It's just more convenient and straightforward.
> > > > >
> > > > > Mb there's another way of internal settings management I missed?
> > > > >
> > > > > On Tuesday, September 16th, 2025 at 11:34 AM, Ash Berlin-Taylor
> > > > > [email protected] wrote:
> > > > >
> > > > > > > On 16 Sep 2025, at 08:58, asquator [email protected] wrote:
> > > > > > >
> > > > > > > Yes, exposing pluggable features means fixing an API, which is
> > > > > > > confining and just hard to do given the current implementation
> > > > > >
> > > > > > class MyScheduler:
> > > > > > def execute(self):
> > > > > > while True:
> > > > > > # Do what ever you want.
> > > > > >
> > > > > > `airflow scheduler --impl=my.module.MyScheduler`
> > > > > >
> > > > > > That is the API.
> > > > > >
> > > > > > That is as pluggable as we need it to be.
> > > > > >
> > > > > > Everything can be built on top of that, including if you want
> it, a
> > > > > > pluggable task selection mechanisms.
> > > > > >
> > > > > > Airflow already has too many config options and ways of tuning
> > > > > > behaviour. We need less of them, not more.
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [email protected]
> > > > For additional commands, e-mail: [email protected]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
>

Re: [Discussion] Make the scheduler's task selection algorithm pluggable

Reply via email to