[ https://issues.apache.org/jira/browse/AIRFLOW-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070484#comment-17070484 ]
Daniel Imberman commented on AIRFLOW-593: ----------------------------------------- This issue has been moved to https://github.com/apache/airflow/issues/7989 > Tasks do not get backfilled sequentially > ---------------------------------------- > > Key: AIRFLOW-593 > URL: https://issues.apache.org/jira/browse/AIRFLOW-593 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun, scheduler > Affects Versions: 1.7.1.3 > Reporter: Jong Kim > Priority: Minor > Attachments: Screen Shot 2018-07-20 at 10.04.24 AM.png > > > I need to have the tasks within a DAG complete in order when running > backfills. I am running on my mac locally using SequentialExecutor. > Let's say I have a DAG running daily at 11AM UTC (0 11 * * *) with a > start_date: datetime(2016, 10, 20, 11, 0, 0). The DAG consists of 3 tasks, > which must complete in order. task0 -> task1 -> task2. This dependency is set > using .set_downstream(). > Today (2016/10/22) I reset the database, turn-on the DAGrun using the on/off > toggle in the webserver, and issue "airflow scheduler", which will > automatically backfill starting from start_date. > It will backfill for 2016/10/20 and 2016/10/21. I expect backfill to run > like the following sequentially: > datetime(2016, 10, 20, 11, 0, 0) task0 > datetime(2016, 10, 20, 11, 0, 0) task1 > datetime(2016, 10, 20, 11, 0, 0) task2 > datetime(2016, 10, 21, 11, 0, 0) task0 > datetime(2016, 10, 21, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task2 > With 'depends_on_past': False, I see Airflow running tasks grouped by > sequence number something like this, which is not what I want: > datetime(2016, 10, 20, 11, 0, 0) task0 > datetime(2016, 10, 21, 11, 0, 0) task0 > datetime(2016, 10, 20, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task1 > datetime(2016, 10, 20, 11, 0, 0) task2 > datetime(2016, 10, 21, 11, 0, 0) task2 > With 'depends_on_past': True and 'wait_for_downstream': True, I expect it to > run like what I need to, but instead it runs some tasks out of order like > this: > datetime(2016, 10, 20, 11, 0, 0) task0 > datetime(2016, 10, 20, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task0 <- out of order! > datetime(2016, 10, 20, 11, 0, 0) task2 <- out of order! > datetime(2016, 10, 21, 11, 0, 0) task1 > datetime(2016, 10, 21, 11, 0, 0) task2 > Is this a bug? If not, am I understanding 'depends_on_past' and > 'wait_for_downstream' correctly? What do I need to do? > The only remedy I can think of is to backfill each date manually. > Public gist of DAG: > https://gist.github.com/jong-eatsa/cba1bf3c182b38e966696da47164faf1 -- This message was sent by Atlassian Jira (v8.3.4#803005)