[ 
https://issues.apache.org/jira/browse/AIRFLOW-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Heuermann updated AIRFLOW-1056:
--------------------------------------
    Description: 
When "catchup=False" a single job run is still triggered when un-pausing a dag 
when there are missed run windows. 

In airflow/jobs.py:create_dag_run(): When catchup is disabled it updates the 
dag.start_date here to prevent the backfill: 
https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L770.
But it looks like the function schedules dags based on a window (using 
sequential run times as lower and upper bounds) so it will always schedule a 
single dag run if there is a missed run between the last run and the time which 
it was unpaused. Even if it was un-paused AFTER those missed runs.

Some ideas on solutions:
* Pass in the time when the scheduler last ran and use that as the lower bound 
of the window, but not sure how easy that is to get to. 
* Update the start_date when a dag with catchup=False is unpaused. Or add a new 
"unpaused_date" field that would serve the same purpose.
* If paused have the scheduler insert a skipped Job record when the job would 
have run.

There might be a simpler solution I'm missing.

  was:
When "catchup=False" a single job run is still triggered when un-pausing a dag 
when there are missed run windows. 

In airflow/jobs.py:create_dag_run(): When catchup is disabled it updates the 
dag.start_date here to prevent the backfill: 
https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L770.
But it looks like the function schedules dags based on a window (using 
sequential run times as lower and upper bounds) so it will always schedule a 
single dag run if there is a missed run between the last run and the time which 
it was unpaused. Even if it was un-paused AFTER those missed runs.

Some ideas on solutions:
* Pass in the time when the scheduler last ran and use that as the lower bound 
of the window, but not sure how easy that is to get to. 
* Do something when a dag with catchup=False is unpaused like update the 
start_date or update missed runs as skipped (the latter may be expensive)

There might be a simpler solution I'm missing.


> Single dag run triggered when un-pausing job with catchup=False
> ---------------------------------------------------------------
>
>                 Key: AIRFLOW-1056
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1056
>             Project: Apache Airflow
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Andrew Heuermann
>
> When "catchup=False" a single job run is still triggered when un-pausing a 
> dag when there are missed run windows. 
> In airflow/jobs.py:create_dag_run(): When catchup is disabled it updates the 
> dag.start_date here to prevent the backfill: 
> https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L770.
> But it looks like the function schedules dags based on a window (using 
> sequential run times as lower and upper bounds) so it will always schedule a 
> single dag run if there is a missed run between the last run and the time 
> which it was unpaused. Even if it was un-paused AFTER those missed runs.
> Some ideas on solutions:
> * Pass in the time when the scheduler last ran and use that as the lower 
> bound of the window, but not sure how easy that is to get to. 
> * Update the start_date when a dag with catchup=False is unpaused. Or add a 
> new "unpaused_date" field that would serve the same purpose.
> * If paused have the scheduler insert a skipped Job record when the job would 
> have run.
> There might be a simpler solution I'm missing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to