All,

This is a heads up and a request for sincere review of PR 
https://github.com/apache/incubator-airflow/pull/1506. 

In PR-1506 I implement one fundamental corner stones from the scheduler roadmap 
(https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg 
<https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg>). It implements 
the create_dagrun functionality that includes creating the taskinstances at 
instantiation time of the dagrun. By having taskinstances created at dagrun 
instantiation time, deadlocks that were tested for will not take place anymore. 
For now, the visual consequence of having these taskinstances already there is 
that they will be black in the tree view.

Tests in core.py were adjusted as they were supposedly creating a dagrun with 
tasks, while they were actually creating dagruns and orphaned TaskInstances 
(ie. the dag_id was not matching the dag_id from the dagrun). This was 
discussed with Arthur, who said these were remnants from the past and should 
not matter anymore. Here there might be a small issue due to the fact that 
BaseOperator.add_task contained a small bug when the task was added from 
DAG.add_task: the dag was never connected to the TaskInstance, thus the 
TaskInstance was created orphaned. This was fixed and I don’t think that newly 
created DagRuns will expose an issue with current orphaned tasks, but please 
have a look at it.

I would like to stress that this change is fundamental to the thinking over the 
last couple of months on how to improve the integrity and robustness of the 
scheduler. The next steps I foresee now is:

1. Add notion of previous to DagRuns 
2. Align start date automatically
3. Make backfills create dagruns
4. Consider backfills in the scheduler
5. Add dag_run_id to taskinstances
6. -> jeremiah's refactoring

1-5 have already been implemented in 
https://github.com/apache/incubator-airflow/compare/master...bolkedebruin:AIRFLOW_SCHEDULER.
 The work I am doing now is splitting it up it digestible chunks.

Thanks
Bolke

Reply via email to