[GitHub] [airflow] kaxil commented on a change in pull request #17552: AIP 39: Documentation

2021-09-14 Thread GitBox


kaxil commented on a change in pull request #17552:
URL: https://github.com/apache/airflow/pull/17552#discussion_r708734047



##
File path: docs/apache-airflow/faq.rst
##
@@ -216,20 +216,35 @@ actually start. If this were not the case, the backfill 
just would not start.
 What does ``execution_date`` mean?
 --
 
-Airflow was developed as a solution for ETL needs. In the ETL world, you 
typically summarize data. So, if you want to
-summarize data for 2016-02-19, You would do it at 2016-02-20 midnight UTC, 
which would be right after all data for
-2016-02-19 becomes available.
-
-This datetime value is available to you as :ref:`Template 
variables` with various formats in Jinja templated
-fields. They are also included in the context dictionary given to an 
Operator's execute function.
+*Execution date* or ``execution_date`` is a historical name for what is called 
a
+*logical date*, and also usually the start of the data interval represented by 
a
+DAG run.
+
+Airflow was developed as a solution for ETL needs. In the ETL world, you
+typically summarize data. So, if you want to summarize data for 2016-02-19, You
+would do it at 2016-02-20 midnight UTC, which would be right after all data for
+2016-02-19 becomes available. This interval between midnights of 2016-02-19 and
+2016-02-20 is called the *data interval*, and since the it represents data in
+the date of 2016-02-19, this date is thus called the run's *logical date*, or
+the date that this DAG run is executed for, thus *execution date*.

Review comment:
   ```suggestion
   typically summarize data. So, if you want to summarize data for 
``2016-02-19``, You
   would do it at ``2016-02-20`` midnight ``UTC``, which would be right after 
all data for
   ``2016-02-19`` becomes available. This interval between midnights of 
``2016-02-19`` and
   ``2016-02-20`` is called the *data interval*, and since it represents data in
   the date of ``2016-02-19``, this date is thus called the run's *logical 
date*, or
   the date that this DAG run is executed for, thus *execution date*.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] kaxil commented on a change in pull request #17552: AIP 39: Documentation

2021-09-14 Thread GitBox


kaxil commented on a change in pull request #17552:
URL: https://github.com/apache/airflow/pull/17552#discussion_r708733553



##
File path: docs/apache-airflow/dag-run.rst
##
@@ -54,17 +54,36 @@ Cron Presets
 Your DAG will be instantiated for each schedule along with a corresponding
 DAG Run entry in the database backend.
 
-.. note::
 
-If you run a DAG on a schedule_interval of one day, the run stamped 
2020-01-01
-will be triggered soon after 2020-01-01T23:59. In other words, the job 
instance is
-started once the period it covers has ended.  The ``execution_date`` 
available in the context
-will also be 2020-01-01.
+.. _data-interval:
 
-The first DAG Run is created based on the minimum ``start_date`` for the 
tasks in your DAG.
-Subsequent DAG Runs are created by the scheduler process, based on your 
DAG’s ``schedule_interval``,
-sequentially. If your start_date is 2020-01-01 and schedule_interval is 
@daily, the first run
-will be created on 2020-01-02 i.e., after your start date has passed.
+Data Interval
+-
+
+Each DAG run in Airflow has an assigned "data interval" that represents the 
time
+range it operates in. For a DAG scheduled with ``@daily``, for example, each of
+its data interval would start at midnight of each day and end at midnight of 
the
+next day.
+
+A DAG run is usually scheduled *after* its associated data interval has ended,
+to ensure the run is able to collect all the data within the time period. In
+other words, a run covering the data period of 2020-01-01 generally does not
+start to run until 2020-01-01 has ended, i.e. after 2020-01-02 00:00:00.
+
+All dates in Airflow are tied to the data interval concept in some way. The
+"logical date" (also called ``execution_date`` in Airflow versions prior to 
2.2)
+of a DAG run, for example, denotes the start of the data interval, not when the
+DAG is actually executed.
+
+Similarly, since the ``start_date`` argument for the DAG and its tasks points 
to
+the same logical date, it marks the start of *the DAG's fist data interval*, 
not
+when tasks in the DAG will start running. In other words, a DAG run will only 
be
+scheduled one interval after ``start_date``.
+
+.. tip::
+
+If ``schedule_interval`` is not enough to express your DAG's schedule,
+logical date, or data interval, see :doc:`Customizing imetables 
`.

Review comment:
   ```suggestion
   logical date, or data interval, see :doc:`Customizing timetables 
`.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org