All, This is a heads up and a request for sincere review of PR https://github.com/apache/incubator-airflow/pull/1506.
In PR-1506 I implement one fundamental corner stones from the scheduler roadmap (https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg <https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg>). It implements the create_dagrun functionality that includes creating the taskinstances at instantiation time of the dagrun. By having taskinstances created at dagrun instantiation time, deadlocks that were tested for will not take place anymore. For now, the visual consequence of having these taskinstances already there is that they will be black in the tree view. Tests in core.py were adjusted as they were supposedly creating a dagrun with tasks, while they were actually creating dagruns and orphaned TaskInstances (ie. the dag_id was not matching the dag_id from the dagrun). This was discussed with Arthur, who said these were remnants from the past and should not matter anymore. Here there might be a small issue due to the fact that BaseOperator.add_task contained a small bug when the task was added from DAG.add_task: the dag was never connected to the TaskInstance, thus the TaskInstance was created orphaned. This was fixed and I don’t think that newly created DagRuns will expose an issue with current orphaned tasks, but please have a look at it. I would like to stress that this change is fundamental to the thinking over the last couple of months on how to improve the integrity and robustness of the scheduler. The next steps I foresee now is: 1. Add notion of previous to DagRuns 2. Align start date automatically 3. Make backfills create dagruns 4. Consider backfills in the scheduler 5. Add dag_run_id to taskinstances 6. -> jeremiah's refactoring 1-5 have already been implemented in https://github.com/apache/incubator-airflow/compare/master...bolkedebruin:AIRFLOW_SCHEDULER. The work I am doing now is splitting it up it digestible chunks. Thanks Bolke