[ Description ] There was no index composed of dag_id and execution_date. So, when scheduler find all tis of dagrun like this "select * from task_instance where dag_id = 'some_id' and execution_date = '2018-09-01 ...'", this query will be using ti_dag_state index (I was testing it in mysql workbench. I was expecting 'ti_state_lkp' but, it was not that case). Perhaps there's no problem when range of execution_date is small (under 1000 dagrun), but I had experienced slow allocation of tis when the dag had 1000+ accumulative dagrun. So, now I was using airflow with adding new index ti_dag_date (dag_id, execution_date) on task_instance table. I have attached result of my test :)
[ Test ] models.py > DAG.run jobs.py > BaseJob.run jobs.py > BackfillJob._execute jobs.py > BackfillJob._execute_for_run_dates jobs.py > BackfillJob._task_instances_for_dag_run models.py > DagRun.get_task_instances tis = session.query(TI).filter( TI.dag_id == self.dag_id, TI.execution_date == self.execution_date, )   ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` [ Full content available at: https://github.com/apache/incubator-airflow/pull/3874 ] This message was relayed via gitbox.apache.org for [email protected]
