ashb opened a new pull request #8814:
URL: https://github.com/apache/airflow/pull/8814


   In debugging another test I noticed that the scheduler was spending a
   long time waiting for a "simple" dag to be parsed. But upon closer
   inspection the parsing process itself was done in a few milliseconds,
   but we just weren't harvesting the results in a timely fashion.
   
   This change uses the `sentinel` attribute of multiprocessing.Connection
   (added in Python 3.3) to be able to wait for all the processes, so that
   as soon as one has finished we get woken up and can immediately harvest
   and pass on the parsed dags.
   
   This makes test_scheduler_job.py about twice as quick, and also reduces
   the time the scheduler spends between tasks .
   
   In real work loads, or where there are lots of dags this likely won't
   equate to much such a huge speed up, but for our (synthetic) elastic
   performance test dag.
   
   These were the timings for the dag to run all the tasks in a single dag
   run to completion., with PERF_SCHEDULE_INTERVAL='1d' PERF_DAGS_COUNT=1
   
   I also have
   
   PERF_SHAPE=linear PERF_TASKS_COUNT=12:
   
   **Before**: 45.4166s
   
   **After**: 16.9499s
   
   PERF_SHAPE=linear PERF_TASKS_COUNT=24:
   
   **Before**: 82.6426s
   
   **After**: 34.0672s
   
   PERF_SHAPE=binary_tree PERF_TASKS_COUNT=24:
   
   **Before**: 20.3802s
   
   **After**: 9.1400s
   
   PERF_SHAPE=grid PERF_TASKS_COUNT=24:
   
   **Before**: 27.4735s
   
   **After**: 11.5607s
   
   If you have many more dag **files**, this likely won't be your bottleneck.
   
   ---
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Target Github ISSUE in description if exists
   - [x] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to