This is a really great improvement! Great job by everybody, we are really
excited about this contribution!
These changes make it easier for Airflow to support much more complex/large
scale use cases in the future. Looking forward to more improvements like
this one!
* Huge thanks to friends from Polidea! *

Evgeny Shulman
databand.ai | CTO

On Mon, Feb 24, 2020 at 6:44 PM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:

> Those are all great improvements Kamil! It would be great to have them
> reviewed, tested and merged for 2.0 !
>
> J.
>
>
> On Mon, Feb 24, 2020 at 5:35 PM Kamil Breguła <kamil.breg...@polidea.com>
> wrote:
>
> > Hello,
> >
> > Polidea [1]  together with Databand [2] has taken steps to optimize
> > scheduler performance.
> > I made many changes last weekend:
> > 1. [AIRFLOW-6856] Bulk fetch paused_dag_ids
> > https://github.com/apache/airflow/pull/7476
> > 2. [AIRFLOW-6857] Bulk sync DAGs
> > https://github.com/apache/airflow/pull/7477
> > 3. [AIRFLOW-6862] Do not check the freshness of fresh DAG
> > https://github.com/apache/airflow/pull/7481
> > 4. [AIRFLOW-6869] Bulk fetch DAGRuns for _process_task_instances
> > https://github.com/apache/airflow/pull/7489
> > 5. [AIRFLOW-6881] Bulk fetch DAGRun for create_dag_run
> > https://github.com/apache/airflow/pull/7502
> > 6. [AIRFLOW-6887] Do not check the state of fresh DAGRun
> > https://github.com/apache/airflow/pull/7510
> > These changes have not yet been merged to allow review by wider
> > audiences. Any feedback is very helpful. The result of the performance
> > benchmark is available in the description of each change.
> >
> > When it comes to the overall changes, It looks as follows.
> >
> > Before:
> > Average time: 8080.246 ms
> > Queries count: 2692
> > After:
> > Average time: 628.801 ms
> > Queries count:  5
> > Diff:
> > Average time: -7452 ms (-92%)
> > Queries count: 2687 (-99%)
> >
> > My changes focused only on DagFileProcessor, but this generates the
> > most database queries and takes a significant amount of scheduler's
> > time.
> >
> > Tomek Urbaszek's change has also been merged in the past to improve
> > performance.
> > 7. [AIRFLOW-6590] Use batch db operations in jobs
> > https://github.com/apache/airflow/pull/7370
> >
> > This is not the last improvement of performance. We still keep working
> > and other changes will appear in the future.
> >
> > Many thanks to friends from Databand [https://databand.ai/] for support.
> >
> > Best regards,
> > Kamil Breguła
> >
> > [1] https://www.polidea.com/services/
> > [2] https://databand.ai/about/
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Reply via email to