GitHub user potiuk added a comment to the discussion: Scheduler performance 
with large number of mapped task instances

If you think it's something that can be improved - PRs are more than welcome. 
But:

* there will not be any serious changes to Airflow 2. Full stop. We only solve 
critical issues that affect many users, what you are describing is still an 
outlier - in the sense that only small number of users run such big dags

* there will be HUGE change in the database pressure in Airflow 3 because of 
architecture change. And it will have HUGE impact on overall performance of 
various DB operations - including scheduling - without even touching scheduler 
code. That's the nature of central retaional db  - we do not yet know extent of 
that. It might or might not fix and improve some of the problems you observe

* the natural way of optimising and handling the code is to optimize where you 
know your bottlenecks are. We do not yet know what those will be for sure in 
the new architecture until we test and observe (and yes optimize when we test 
and observe)

* however if you **know** (you seem to be convinced) which part is problematic 
and how to fix it - you are most welcome to provide PR with some stats, 
findings and benchmarks, however, I'd suggest to do it after the Airflow 3 
changes are implemented - but if you feel like spending time before because you 
are convinced it will be good, you are absolutely free to do it. This is how 
open source project like this works - you get it for free and people who 
contribute on it (a lot of them in their free time) choose what they are 
working on - and you can become one of them if you feel you want to spend time 
and energy on it and submit PRs for that.


GitHub link: 
https://github.com/apache/airflow/discussions/46044#discussioncomment-11977846

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to