Nice!

On Tue, Feb 25, 2020 at 12:11 AM Robin Edwards <r...@bidnamic.com> wrote:

> This is brilliant work, thank you! Looking forward to watching my RDS
> metrics when this gets deployed :-)
>
> On Tue, 25 Feb 2020, 07:08 Driesprong, Fokko, <fo...@driesprong.frl>
> wrote:
>
> > Sweet work Kamil and others! I'll try to go through them today!
> >
> > Cheers, Fokko
> >
> > Op ma 24 feb. 2020 om 22:37 schreef Tao Feng <fengta...@gmail.com>:
> >
> > > Great work Kamil! Let us know once it is landed in one of the future
> > > releases. Would love to try it out :)
> > >
> > > Best,
> > > -Tao
> > >
> > > On Mon, Feb 24, 2020 at 12:54 PM Qingping Hou <q...@scribd.com> wrote:
> > >
> > > > Awesome work Kamil! Great to see us embracing query batching in the
> > > > code base. I can't wait to deploy those optimizations into our
> > > > production environment.
> > > >
> > > > Thanks,
> > > > QP Hou
> > > >
> > > > On Mon, Feb 24, 2020 at 8:35 AM Kamil Breguła <
> > kamil.breg...@polidea.com
> > > >
> > > > wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > Polidea [1]  together with Databand [2] has taken steps to optimize
> > > > > scheduler performance.
> > > > > I made many changes last weekend:
> > > > > 1. [AIRFLOW-6856] Bulk fetch paused_dag_ids
> > > > > https://github.com/apache/airflow/pull/7476
> > > > > 2. [AIRFLOW-6857] Bulk sync DAGs
> > > > > https://github.com/apache/airflow/pull/7477
> > > > > 3. [AIRFLOW-6862] Do not check the freshness of fresh DAG
> > > > > https://github.com/apache/airflow/pull/7481
> > > > > 4. [AIRFLOW-6869] Bulk fetch DAGRuns for _process_task_instances
> > > > > https://github.com/apache/airflow/pull/7489
> > > > > 5. [AIRFLOW-6881] Bulk fetch DAGRun for create_dag_run
> > > > > https://github.com/apache/airflow/pull/7502
> > > > > 6. [AIRFLOW-6887] Do not check the state of fresh DAGRun
> > > > > https://github.com/apache/airflow/pull/7510
> > > > > These changes have not yet been merged to allow review by wider
> > > > > audiences. Any feedback is very helpful. The result of the
> > performance
> > > > > benchmark is available in the description of each change.
> > > > >
> > > > > When it comes to the overall changes, It looks as follows.
> > > > >
> > > > > Before:
> > > > > Average time: 8080.246 ms
> > > > > Queries count: 2692
> > > > > After:
> > > > > Average time: 628.801 ms
> > > > > Queries count:  5
> > > > > Diff:
> > > > > Average time: -7452 ms (-92%)
> > > > > Queries count: 2687 (-99%)
> > > > >
> > > > > My changes focused only on DagFileProcessor, but this generates the
> > > > > most database queries and takes a significant amount of scheduler's
> > > > > time.
> > > > >
> > > > > Tomek Urbaszek's change has also been merged in the past to improve
> > > > performance.
> > > > > 7. [AIRFLOW-6590] Use batch db operations in jobs
> > > > > https://github.com/apache/airflow/pull/7370
> > > > >
> > > > > This is not the last improvement of performance. We still keep
> > working
> > > > > and other changes will appear in the future.
> > > > >
> > > > > Many thanks to friends from Databand [https://databand.ai/] for
> > > support.
> > > > >
> > > > > Best regards,
> > > > > Kamil Breguła
> > > > >
> > > > > [1] https://www.polidea.com/services/
> > > > > [2] https://databand.ai/about/
> > > >
> > >
> >
>

Reply via email to