Great work Kamil! Let us know once it is landed in one of the future releases. Would love to try it out :)
Best, -Tao On Mon, Feb 24, 2020 at 12:54 PM Qingping Hou <q...@scribd.com> wrote: > Awesome work Kamil! Great to see us embracing query batching in the > code base. I can't wait to deploy those optimizations into our > production environment. > > Thanks, > QP Hou > > On Mon, Feb 24, 2020 at 8:35 AM Kamil Breguła <kamil.breg...@polidea.com> > wrote: > > > > Hello, > > > > Polidea [1] together with Databand [2] has taken steps to optimize > > scheduler performance. > > I made many changes last weekend: > > 1. [AIRFLOW-6856] Bulk fetch paused_dag_ids > > https://github.com/apache/airflow/pull/7476 > > 2. [AIRFLOW-6857] Bulk sync DAGs > > https://github.com/apache/airflow/pull/7477 > > 3. [AIRFLOW-6862] Do not check the freshness of fresh DAG > > https://github.com/apache/airflow/pull/7481 > > 4. [AIRFLOW-6869] Bulk fetch DAGRuns for _process_task_instances > > https://github.com/apache/airflow/pull/7489 > > 5. [AIRFLOW-6881] Bulk fetch DAGRun for create_dag_run > > https://github.com/apache/airflow/pull/7502 > > 6. [AIRFLOW-6887] Do not check the state of fresh DAGRun > > https://github.com/apache/airflow/pull/7510 > > These changes have not yet been merged to allow review by wider > > audiences. Any feedback is very helpful. The result of the performance > > benchmark is available in the description of each change. > > > > When it comes to the overall changes, It looks as follows. > > > > Before: > > Average time: 8080.246 ms > > Queries count: 2692 > > After: > > Average time: 628.801 ms > > Queries count: 5 > > Diff: > > Average time: -7452 ms (-92%) > > Queries count: 2687 (-99%) > > > > My changes focused only on DagFileProcessor, but this generates the > > most database queries and takes a significant amount of scheduler's > > time. > > > > Tomek Urbaszek's change has also been merged in the past to improve > performance. > > 7. [AIRFLOW-6590] Use batch db operations in jobs > > https://github.com/apache/airflow/pull/7370 > > > > This is not the last improvement of performance. We still keep working > > and other changes will appear in the future. > > > > Many thanks to friends from Databand [https://databand.ai/] for support. > > > > Best regards, > > Kamil Breguła > > > > [1] https://www.polidea.com/services/ > > [2] https://databand.ai/about/ >