That does sound like a bug, and I would have expected, as you did, that not 
specifying an end_date on some tasks means those tasks should run for ever.

Changes that probably need making is that a task end_date of None on a task 
should me "greater" than other task dates in/around the lines you linked to.

Do we need to add a TIDep 
https://github.com/apache/incubator-airflow/tree/master/airflow/ti_deps/deps 
<https://github.com/apache/incubator-airflow/tree/master/airflow/ti_deps/deps> 
to ensure the exec date is less than the task end date?

-ash

> On 21 Feb 2018, at 20:58, Chris Palmer <ch...@crpalmer.com> wrote:
> 
> I was very surprised to find that if you set an end_date on any of the
> tasks in a DAG, that the scheduler won't create DagRuns after the minimum
> end_date of tasks. The code that does this is the 6 or so lines starting
> here -
> https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L867
> .
> 
> So if for example I have:
> 
>   - a DAG with a start_date of 2018-02-01, no specific end_date and a
>   daily schedule
>   - One task in that DAG with no specified end_date
>   - A second task in that DAG with an end_date of 2018-02-02
> 
> The scheduler will create a DagRuns for 2018-02-01 and 2018-02-02 but will
> not create a DagRun for 2018-02-03 or later.
> 
> That seems completely counter intuitive to me. I would expect the scheduler
> to keep creating DagRuns so that the first task can keep running.
> 
> 
> Interestingly, if I manually created a DagRun for 2018-02-03 then the
> scheduler would then only scheduled the first task for that execution_date
> and actually respects the end_date of the second task.
> 
> The only alternative to adding an end_date to a task is to edit the DAG and
> remove those tasks from the DAG entirely. However, that means the webserver
> is no longer aware of those tasks and I can't look at the historical
> behavior in the UI.
> 
> 
> Does anyone have explanation for why this logic is there? Is there some
> necessary use case for that restriction that I'm not thinking about?
> 
> 
> I could see a similar piece of code that checks to see if all tasks in the
> DAG have specified end_dates and prevents the scheduler from creating
> DagRuns passed the MAX of those dates. There is no point in creating
> DagRuns if none of the tasks are going to be run, but as long as at least
> one task can run for that execution_date I think the scheduler should
> create it.
> 
> Thanks
> Chris

Reply via email to