A very common source of confusion for our users is when they specify
start_date in default_args but not in their DAG arguments and then try to
change this start_date to move the execution of their DAG forward (e.g.
from 2015 to 2016). This doesn't work because the logic that is used to
calculate the "initial" start date of a dag differs from the logic to
calculate subsequent dagrun start dates.

Current Airflow Logic:
DS to schedule initial dagrun: dag.start_date if it exists, else min(start
date of tasks_of_dag)
DS to schedule subsequent dagruns: last_dagrun + scheduled_interval

There are a couple ways of addressing this:
1. Change the definition of start date for subsequent dagruns to match the
"initial" dagrun start date (calculated from the minimum of task start
dates)
2. Force explicit dag start dates

I personally like 1.

I also propose that we throw errors for DAGs that have tasks that depend on
other tasks with start dates that occur after theirs (otherwise there could
be deadlocks).

What do people think?

Reply via email to