[ 
https://issues.apache.org/jira/browse/AIRFLOW-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146348#comment-16146348
 ] 

Jong Kim edited comment on AIRFLOW-593 at 8/29/17 11:48 PM:
------------------------------------------------------------

Any update on this? I consider this a pretty serious bug. The public gist of 
DAG above can easily be run to verify my claim.

The above doesn't work will backfills either.

{color:red}airflow backfill playground_scheduler -s 2016-10-20T11:00:00 -e 
2016-10-21T11:00:00 -I{color}

The current workaround I have is to backfill each DAG run manually one at a 
time...which diminishes the "backfill" nature of this command.


was (Author: jongyonkim):
Any update on this? I consider this a pretty serious bug. The public gist of 
DAG above can easily be run to verify my claim.

The current workaround I have is to backfill each DAG run manually one at a 
time...which diminishes the "backfill" nature of this command.

> Tasks do not get backfilled sequentially
> ----------------------------------------
>
>                 Key: AIRFLOW-593
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-593
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DagRun, scheduler
>    Affects Versions: Airflow 1.7.1.3
>            Reporter: Jong Kim
>            Priority: Minor
>
> I need to have the tasks within a DAG complete in order when running 
> backfills. I am running on my mac locally using SequentialExecutor.
> Let's say I have a DAG running daily at 11AM UTC (0 11 * * *) with a 
> start_date: datetime(2016, 10, 20, 11, 0, 0). The DAG consists of 3 tasks, 
> which must complete in order. task0 -> task1 -> task2. This dependency is set 
> using .set_downstream().
> Today (2016/10/22) I reset the database, turn-on the DAGrun using the on/off 
> toggle in the webserver, and issue "airflow scheduler", which will 
> automatically backfill starting from start_date.
> It will backfill for 2016/10/20 and 2016/10/21.  I expect backfill to run 
> like the following sequentially:
> datetime(2016, 10, 20, 11, 0, 0) task0
> datetime(2016, 10, 20, 11, 0, 0) task1
> datetime(2016, 10, 20, 11, 0, 0) task2
> datetime(2016, 10, 21, 11, 0, 0) task0
> datetime(2016, 10, 21, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task2
> With 'depends_on_past': False, I see Airflow running tasks grouped by 
> sequence number something like this, which is not what I want:
> datetime(2016, 10, 20, 11, 0, 0) task0
> datetime(2016, 10, 21, 11, 0, 0) task0
> datetime(2016, 10, 20, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task1
> datetime(2016, 10, 20, 11, 0, 0) task2
> datetime(2016, 10, 21, 11, 0, 0) task2
> With 'depends_on_past': True and 'wait_for_downstream': True, I expect it to 
> run like what I need to, but instead it runs some tasks out of order like 
> this:
> datetime(2016, 10, 20, 11, 0, 0) task0
> datetime(2016, 10, 20, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task0   <- out of order!
> datetime(2016, 10, 20, 11, 0, 0) task2   <- out of order!
> datetime(2016, 10, 21, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task2
> Is this a bug? If not, am I understanding 'depends_on_past' and 
> 'wait_for_downstream' correctly? What do I need to do?
> The only remedy I can think of is to backfill each date manually.
> Public gist of DAG: 
> https://gist.github.com/jong-eatsa/cba1bf3c182b38e966696da47164faf1



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to