[ https://issues.apache.org/jira/browse/AIRFLOW-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Heuermann updated AIRFLOW-1056: -------------------------------------- Description: When "catchup=False" a single job run is still triggered when un-pausing a dag when there are missed run windows. In airflow/jobs.py:create_dag_run(): When catchup is disabled it updates the dag.start_date here to prevent the backfill: https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L770. But it looks like the function schedules dags based on a window (using sequential run times as lower and upper bounds) so it will always schedule a single dag run if there is a missed run between the last run and the time which it was unpaused. Even if it was un-paused AFTER those missed runs. Some ideas on solutions: * Pass in the time when the scheduler last ran and use that as the lower bound of the window, but not sure how easy that is to get to. * Update the start_date when a dag with catchup=False is unpaused. Or add a new "unpaused_date" field that would serve the same purpose. * If paused have the scheduler insert a skipped Job record when the job would have run. There might be a simpler solution I'm missing. was: When "catchup=False" a single job run is still triggered when un-pausing a dag when there are missed run windows. In airflow/jobs.py:create_dag_run(): When catchup is disabled it updates the dag.start_date here to prevent the backfill: https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L770. But it looks like the function schedules dags based on a window (using sequential run times as lower and upper bounds) so it will always schedule a single dag run if there is a missed run between the last run and the time which it was unpaused. Even if it was un-paused AFTER those missed runs. Some ideas on solutions: * Pass in the time when the scheduler last ran and use that as the lower bound of the window, but not sure how easy that is to get to. * Do something when a dag with catchup=False is unpaused like update the start_date or update missed runs as skipped (the latter may be expensive) There might be a simpler solution I'm missing. > Single dag run triggered when un-pausing job with catchup=False > --------------------------------------------------------------- > > Key: AIRFLOW-1056 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1056 > Project: Apache Airflow > Issue Type: Bug > Affects Versions: 1.8.0 > Reporter: Andrew Heuermann > > When "catchup=False" a single job run is still triggered when un-pausing a > dag when there are missed run windows. > In airflow/jobs.py:create_dag_run(): When catchup is disabled it updates the > dag.start_date here to prevent the backfill: > https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L770. > But it looks like the function schedules dags based on a window (using > sequential run times as lower and upper bounds) so it will always schedule a > single dag run if there is a missed run between the last run and the time > which it was unpaused. Even if it was un-paused AFTER those missed runs. > Some ideas on solutions: > * Pass in the time when the scheduler last ran and use that as the lower > bound of the window, but not sure how easy that is to get to. > * Update the start_date when a dag with catchup=False is unpaused. Or add a > new "unpaused_date" field that would serve the same purpose. > * If paused have the scheduler insert a skipped Job record when the job would > have run. > There might be a simpler solution I'm missing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)