baolsen commented on a change in pull request #6999: [AIRFLOW-XXXX] Clarify 
wait_for_downstream and execution_date
URL: https://github.com/apache/airflow/pull/6999#discussion_r362747213
 
 

 ##########
 File path: docs/concepts.rst
 ##########
 @@ -113,13 +116,138 @@ DAGs can be used as context managers to automatically 
assign new operators to th
 
     op.dag is dag # True
 
-.. _concepts-operators:
+.. _concepts:dagruns:
+
+DAG Runs
+========
+
+A DAG run is a physical instance of a DAG, containing task instances that run 
for a specific ``execution_date``.
+
+A DAG run is usually created by the Airflow scheduler, but can also be created 
by an external trigger. 
+Multiple DAG runs may be running at once for a particular DAG, each of them 
having a different ``execution_date``.
+For example, we might currently have two DAG runs that are in progress for 
2016-01-01 and 2016-01-02 respectively.
+
+.. _concepts:execution_date:
+
+execution_date
+--------------
+
+The ``execution_date`` is the *logical* date and time which the DAG Run, and 
its task instances, are running for.
+
+This allows task instances to process data for the desired *logical* date & 
time.
+While a task_instance or DAG run might have a *physical* start date of now,
+their *logical* date might be 3 months ago because we are busy reloading 
something.
+
+In the prior example the ``execution_date`` was 2016-01-01 for the first DAG 
Run and 2016-01-02 for the second.
+
+A DAG run and all task instances created within it are instanced with the same 
``execution_date``, so
+that logically you can think of a DAG run as simulating the DAG running all of 
its tasks at some
+previous date & time specified by the ``execution_date``.
+
+.. _concepts:tasks:
+
+Tasks
+=====
+
+A Task defines a unit of work within a DAG; it is represented as a node in the 
DAG graph, and it is written in Python.
+
+Each task is an implementation of an Operator, for example a 
``PythonOperator`` to execute some Python code,
+or a ``BashOperator`` to run a Bash command.
+
+The task implements an operator by defining specific values for that operator,
+such as a Python callable in the case of ``PythonOperator`` or a Bash command 
in the case of ``BashOperator``.
+
+Relations between Tasks
+-----------------------
+
+Consider the following DAG with two tasks.
+Each task is a node in our DAG, and there is a dependency from task_1 to 
task_2:
+
+.. code:: python
+
+    with DAG('my_dag', start_date=datetime(2016, 1, 1)) as dag:
+        task_1 = DummyOperator('task_1')
+        task_2 = DummyOperator('task_2')
+        task_1 >> task_2 # Define dependencies
+
+We can say that task_1 is *upstream* of task_2, and conversely task_2 is 
*downstream* of task_1.
+When a DAG Run is created, task_1 will start running and task_2 waits for 
task_1 to complete successfully before it may start.
+
+Task Instances
+==============
+
+A task instance represents a specific run of a task and is characterized as the
+combination of a DAG, a task, and a point in time (``execution_date``). Task 
instances
+also have an indicative state, which could be "running", "success", "failed", 
"skipped", "up
+for retry", etc.
+
+Tasks are defined in DAGs, and both are written in Python code to define what 
you want to do.
+Task Instances belong to DAG Runs, have an associated ``execution_date``, and 
are physicalised, runnable entities.
+
+Relations between Task Instances
+--------------------------------
+
+Again consider the following tasks, defined for some DAG:
+
+.. code:: python
+
+    with DAG('my_dag', start_date=datetime(2016, 1, 1)) as dag:
+        task_1 = DummyOperator('task_1')
+        task_2 = DummyOperator('task_2')
+        task_1 >> task_2 # Define dependencies
+
+When we enable this DAG, the scheduler creates several DAG Runs - one with 
``execution_date`` of 2016-01-01,
+one with ``execution_date`` of 2016-01-02, and so on up to the current date.
+
+Each DAG Run will contain a task_1 Task Instance and a task_2 Task instance. 
Both Task Instances will
+have ``execution_date`` equal to the DAG Run's ``execution_date``, and each 
task_2 will be *upstream* of
+(depends on) its task_1.
+
+We can also say that task_1 for 2016-01-01 is the *previous* task instance of 
the task_1 for 2016-01-02.
+Or that the DAG Run for 2016-01-01 is the *previous* DAG Run to the DAG Run of 
2016-01-02.
+Here, *previous* refers to the logical past/prior ``execution_date``, that 
runs independently of other runs,
+and *upstream* refers to a dependency within the same run and having the same 
``execution_date``.
+
 
 Review comment:
   Not sure how you feel about this note - but I think differentiating between 
previous and upstream is important for a new user especially. The concepts of 
upstream / downstream task didn't sink in for me at first and explicitly 
calling them out from previous would have helped me immediately understand.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to