amichai07 opened a new pull request #4751: collected trigger rule dep check per 
dag run
URL: https://github.com/apache/airflow/pull/4751
 
 
   ### Jira
   * My PR addresses the following 
https://issues.apache.org/jira/browse/AIRFLOW-3607 and references them in the 
PR title
   * Decreasing scheduler delay between tasks
   
   ### Description
   * The delay between tasks can be a major issue, especially when we have dags 
with many subdags,
     figures out that the scheduling process spends plenty of time in 
dependency checking,  we took the
     trigger rule dependency which calls the db for each task instance,  we 
made it call the db just once for
     each dag_run.
   
   ### Tests
   * My PR does not need extra testing for this extremely good reason:
     My pr uses the code from the  and also has a fall back to the original 
behaviour, the ci covers all of the logic and cases that might happen already
   
   ### Commits
   * removed unnecessary queries  - run on each dag run instead of each ti
   
   ### Documentation
   no need for new docs
   
   ### Code Quality
   * Passes `flake8`
   * Tested in production environment
   
   ### Results
   The tests was made on a heavily multitasks dag (35 tasks).
   The tasks don't do any db queries
   
   **On local environment**
   before changes:
   - avg delay between tasks :4.22 sec
   - number of queries during 10 minutes: 118,879 
   
   after collecting dep check queries:
   - avg delay between tasks :3.86 sec
   - number of queries during 10 minutes: 104,397
   
   Stress test - running the dag for every 10 sec for an hour:
   before changes:

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to