Nice! On Tue, Feb 25, 2020 at 12:11 AM Robin Edwards <r...@bidnamic.com> wrote:
> This is brilliant work, thank you! Looking forward to watching my RDS > metrics when this gets deployed :-) > > On Tue, 25 Feb 2020, 07:08 Driesprong, Fokko, <fo...@driesprong.frl> > wrote: > > > Sweet work Kamil and others! I'll try to go through them today! > > > > Cheers, Fokko > > > > Op ma 24 feb. 2020 om 22:37 schreef Tao Feng <fengta...@gmail.com>: > > > > > Great work Kamil! Let us know once it is landed in one of the future > > > releases. Would love to try it out :) > > > > > > Best, > > > -Tao > > > > > > On Mon, Feb 24, 2020 at 12:54 PM Qingping Hou <q...@scribd.com> wrote: > > > > > > > Awesome work Kamil! Great to see us embracing query batching in the > > > > code base. I can't wait to deploy those optimizations into our > > > > production environment. > > > > > > > > Thanks, > > > > QP Hou > > > > > > > > On Mon, Feb 24, 2020 at 8:35 AM Kamil Breguła < > > kamil.breg...@polidea.com > > > > > > > > wrote: > > > > > > > > > > Hello, > > > > > > > > > > Polidea [1] together with Databand [2] has taken steps to optimize > > > > > scheduler performance. > > > > > I made many changes last weekend: > > > > > 1. [AIRFLOW-6856] Bulk fetch paused_dag_ids > > > > > https://github.com/apache/airflow/pull/7476 > > > > > 2. [AIRFLOW-6857] Bulk sync DAGs > > > > > https://github.com/apache/airflow/pull/7477 > > > > > 3. [AIRFLOW-6862] Do not check the freshness of fresh DAG > > > > > https://github.com/apache/airflow/pull/7481 > > > > > 4. [AIRFLOW-6869] Bulk fetch DAGRuns for _process_task_instances > > > > > https://github.com/apache/airflow/pull/7489 > > > > > 5. [AIRFLOW-6881] Bulk fetch DAGRun for create_dag_run > > > > > https://github.com/apache/airflow/pull/7502 > > > > > 6. [AIRFLOW-6887] Do not check the state of fresh DAGRun > > > > > https://github.com/apache/airflow/pull/7510 > > > > > These changes have not yet been merged to allow review by wider > > > > > audiences. Any feedback is very helpful. The result of the > > performance > > > > > benchmark is available in the description of each change. > > > > > > > > > > When it comes to the overall changes, It looks as follows. > > > > > > > > > > Before: > > > > > Average time: 8080.246 ms > > > > > Queries count: 2692 > > > > > After: > > > > > Average time: 628.801 ms > > > > > Queries count: 5 > > > > > Diff: > > > > > Average time: -7452 ms (-92%) > > > > > Queries count: 2687 (-99%) > > > > > > > > > > My changes focused only on DagFileProcessor, but this generates the > > > > > most database queries and takes a significant amount of scheduler's > > > > > time. > > > > > > > > > > Tomek Urbaszek's change has also been merged in the past to improve > > > > performance. > > > > > 7. [AIRFLOW-6590] Use batch db operations in jobs > > > > > https://github.com/apache/airflow/pull/7370 > > > > > > > > > > This is not the last improvement of performance. We still keep > > working > > > > > and other changes will appear in the future. > > > > > > > > > > Many thanks to friends from Databand [https://databand.ai/] for > > > support. > > > > > > > > > > Best regards, > > > > > Kamil Breguła > > > > > > > > > > [1] https://www.polidea.com/services/ > > > > > [2] https://databand.ai/about/ > > > > > > > > > >