[ Description ]
There was no index composed of dag_id and execution_date. So, when scheduler 
find all tis of dagrun like this "select * from task_instance where dag_id = 
'some_id' and execution_date = '2018-09-01 ...'", this query will be using 
ti_dag_state index (I was testing it in mysql workbench. I was expecting 
'ti_state_lkp' but, it was not that case). Perhaps there's no problem when 
range of execution_date is small (under 1000 dagrun), but I had experienced 
slow allocation of tis when the dag had 1000+ accumulative dagrun. So, now I 
was using airflow with adding new index ti_dag_date (dag_id, execution_date) on 
task_instance table. I have attached result of my test :)

[ Test ]
models.py > DAG.run
jobs.py > BaseJob.run
jobs.py > BackfillJob._execute
jobs.py > BackfillJob._execute_for_run_dates
jobs.py > BackfillJob._task_instances_for_dag_run
models.py > DagRun.get_task_instances
tis = session.query(TI).filter(
TI.dag_id == self.dag_id,
TI.execution_date == self.execution_date,
)
![image](https://user-images.githubusercontent.com/6738941/45285016-fb9ecc00-b51c-11e8-945c-c28d81aece02.png)
![image](https://user-images.githubusercontent.com/6738941/45285019-fe012600-b51c-11e8-91fa-a66c2293ca5d.png)


### Jira

- [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
  - https://issues.apache.org/jira/browse/AIRFLOW-XXX
  - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.

### Description

- [ ] Here are some details about my PR, including screenshots of any UI 
changes:

### Tests

- [ ] My PR adds the following unit tests __OR__ does not need testing for this 
extremely good reason:

### Commits

- [ ] My commits all reference Jira issues in their subject lines, and I have 
squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
  1. Subject is separated from body by a blank line
  1. Subject is limited to 50 characters (not including Jira issue reference)
  1. Subject does not end with a period
  1. Subject uses the imperative mood ("add", not "adding")
  1. Body wraps at 72 characters
  1. Body explains "what" and "why", not "how"

### Documentation

- [ ] In case of new functionality, my PR adds documentation that describes how 
to use it.
  - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.

### Code Quality

- [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`


[ Full content available at: 
https://github.com/apache/incubator-airflow/pull/3874 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to