GitHub user potiuk edited a comment on the discussion: Scheduler performance with large number of mapped task instances
If you think it's something that can be improved - PRs are more than welcome. But: * there will not be any serious changes to Airflow 2. Full stop. We only solve critical issues that affect many users, what you are describing is still an outlier - in the sense that only small number of users run such big dags * there will be HUGE change in the database pressure in Airflow 3 because of architecture change. And it will have HUGE impact on overall performance of various DB operations - including scheduling - without even touching scheduler code. That's the nature of central retaional db - we do not yet know extent of that. It might or might not fix and improve some of the problems you observe * the natural way of optimising and handling the code is to optimize where you know your bottlenecks are. We do not yet know what those will be for sure in the new architecture until we test and observe (and yes optimize when we test and observe) * however if you **know** (you seem to be convinced) which part is problematic and how to fix it - you are most welcome to provide PR with some stats, findings and benchmarks, however, I'd suggest to do it after the Airflow 3 changes are implemented. But if you feel like spending time before because you are convinced it will be good, you are absolutely free to do it. This is how open source project like this works - you get it for free and people who contribute on it (a lot of them in their free time) choose what they are working on - and you can become one of them if you feel you want to spend time and energy on it and submit PRs for that. GitHub link: https://github.com/apache/airflow/discussions/46044#discussioncomment-11977846 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
