Big performance optimization of Scheduler - 10x faster , 2000+ fewer queries count

Kamil Breguła Mon, 24 Feb 2020 08:36:01 -0800

Hello,

Polidea [1]  together with Databand [2] has taken steps to optimize
scheduler performance.
I made many changes last weekend:
1. [AIRFLOW-6856] Bulk fetch paused_dag_ids
https://github.com/apache/airflow/pull/7476
2. [AIRFLOW-6857] Bulk sync DAGs
https://github.com/apache/airflow/pull/7477
3. [AIRFLOW-6862] Do not check the freshness of fresh DAG
https://github.com/apache/airflow/pull/7481
4. [AIRFLOW-6869] Bulk fetch DAGRuns for _process_task_instances
https://github.com/apache/airflow/pull/7489
5. [AIRFLOW-6881] Bulk fetch DAGRun for create_dag_run
https://github.com/apache/airflow/pull/7502
6. [AIRFLOW-6887] Do not check the state of fresh DAGRun
https://github.com/apache/airflow/pull/7510
These changes have not yet been merged to allow review by wider
audiences. Any feedback is very helpful. The result of the performance
benchmark is available in the description of each change.


When it comes to the overall changes, It looks as follows.

Before:
Average time: 8080.246 ms
Queries count: 2692
After:
Average time: 628.801 ms
Queries count:  5
Diff:
Average time: -7452 ms (-92%)
Queries count: 2687 (-99%)

My changes focused only on DagFileProcessor, but this generates the
most database queries and takes a significant amount of scheduler's
time.

Tomek Urbaszek's change has also been merged in the past to improve performance.
7. [AIRFLOW-6590] Use batch db operations in jobs
https://github.com/apache/airflow/pull/7370

This is not the last improvement of performance. We still keep working
and other changes will appear in the future.

Many thanks to friends from Databand [https://databand.ai/] for support.

Best regards,
Kamil Breguła

[1] https://www.polidea.com/services/
[2] https://databand.ai/about/

Big performance optimization of Scheduler - 10x faster , 2000+ fewer queries count

Reply via email to