uranusjr commented on a change in pull request #17552: URL: https://github.com/apache/airflow/pull/17552#discussion_r709121296
########## File path: docs/apache-airflow/dag-run.rst ########## @@ -54,17 +54,36 @@ Cron Presets Your DAG will be instantiated for each schedule along with a corresponding DAG Run entry in the database backend. -.. note:: - If you run a DAG on a schedule_interval of one day, the run stamped 2020-01-01 - will be triggered soon after 2020-01-01T23:59. In other words, the job instance is - started once the period it covers has ended. The ``execution_date`` available in the context - will also be 2020-01-01. +.. _data-interval: - The first DAG Run is created based on the minimum ``start_date`` for the tasks in your DAG. - Subsequent DAG Runs are created by the scheduler process, based on your DAG’s ``schedule_interval``, - sequentially. If your start_date is 2020-01-01 and schedule_interval is @daily, the first run - will be created on 2020-01-02 i.e., after your start date has passed. +Data Interval +------------- + +Each DAG run in Airflow has an assigned "data interval" that represents the time +range it operates in. For a DAG scheduled with ``@daily``, for example, each of +its data interval would start at midnight of each day and end at midnight of the +next day. + +A DAG run is usually scheduled *after* its associated data interval has ended, +to ensure the run is able to collect all the data within the time period. In +other words, a run covering the data period of 2020-01-01 generally does not +start to run until 2020-01-01 has ended, i.e. after 2020-01-02 00:00:00. + +All dates in Airflow are tied to the data interval concept in some way. The +"logical date" (also called ``execution_date`` in Airflow versions prior to 2.2) +of a DAG run, for example, denotes the start of the data interval, not when the Review comment: Right now logical date is hard-wired to start of data interval in the data structure. Theoratically a custom timetable can set it to something else, but it’s not exposed as a part of the API (must subclass `DataInterval`) nor explained anywhere. I think it’s best to do this as long as `executable_date` still has UNIQUE constraint to avoid people doing silly things (intentionally or not). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org