amichai07 opened a new pull request #4751: collected trigger rule dep check per dag run URL: https://github.com/apache/airflow/pull/4751 ### Jira * My PR addresses the following https://issues.apache.org/jira/browse/AIRFLOW-3607 and references them in the PR title * Decreasing scheduler delay between tasks ### Description * The delay between tasks can be a major issue, especially when we have dags with many subdags, figures out that the scheduling process spends plenty of time in dependency checking, we took the trigger rule dependency which calls the db for each task instance, we made it call the db just once for each dag_run. ### Tests * My PR does not need extra testing for this extremely good reason: My pr uses the code from the and also has a fall back to the original behaviour, the ci covers all of the logic and cases that might happen already ### Commits * removed unnecessary queries - run on each dag run instead of each ti ### Documentation no need for new docs ### Code Quality * Passes `flake8` * Tested in production environment ### Results The tests was made on a heavily multitasks dag (35 tasks). The tasks don't do any db queries **On local environment** before changes: - avg delay between tasks :4.22 sec - number of queries during 10 minutes: 118,879 after collecting dep check queries: - avg delay between tasks :3.86 sec - number of queries during 10 minutes: 104,397 Stress test - running the dag for every 10 sec for an hour: before changes:
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services