Hello, Polidea [1] together with Databand [2] has taken steps to optimize scheduler performance. I made many changes last weekend: 1. [AIRFLOW-6856] Bulk fetch paused_dag_ids https://github.com/apache/airflow/pull/7476 2. [AIRFLOW-6857] Bulk sync DAGs https://github.com/apache/airflow/pull/7477 3. [AIRFLOW-6862] Do not check the freshness of fresh DAG https://github.com/apache/airflow/pull/7481 4. [AIRFLOW-6869] Bulk fetch DAGRuns for _process_task_instances https://github.com/apache/airflow/pull/7489 5. [AIRFLOW-6881] Bulk fetch DAGRun for create_dag_run https://github.com/apache/airflow/pull/7502 6. [AIRFLOW-6887] Do not check the state of fresh DAGRun https://github.com/apache/airflow/pull/7510 These changes have not yet been merged to allow review by wider audiences. Any feedback is very helpful. The result of the performance benchmark is available in the description of each change.
When it comes to the overall changes, It looks as follows. Before: Average time: 8080.246 ms Queries count: 2692 After: Average time: 628.801 ms Queries count: 5 Diff: Average time: -7452 ms (-92%) Queries count: 2687 (-99%) My changes focused only on DagFileProcessor, but this generates the most database queries and takes a significant amount of scheduler's time. Tomek Urbaszek's change has also been merged in the past to improve performance. 7. [AIRFLOW-6590] Use batch db operations in jobs https://github.com/apache/airflow/pull/7370 This is not the last improvement of performance. We still keep working and other changes will appear in the future. Many thanks to friends from Databand [https://databand.ai/] for support. Best regards, Kamil Breguła [1] https://www.polidea.com/services/ [2] https://databand.ai/about/
