dstandish commented on code in PR #32169: URL: https://github.com/apache/airflow/pull/32169#discussion_r1258622540
########## docs/apache-airflow/core-concepts/dags.rst: ########## @@ -484,6 +484,125 @@ You can also combine this with the :ref:`concepts:depends-on-past` functionality .. image:: /img/branch_with_trigger.png +.. _concepts:setup-and-teardown: + +Setup and Teardown +~~~~~~~~~~~~~~~~~~ + +In data workflows it's common to create resources (such as a compute resource), use it to do some work, and then tear it down. Airflow provides setup and teardown tasks to support this need. + +Key features of setup and teardown tasks: + + * If you clear a task, its setups and teardowns will be cleared. + * By default, teardown tasks are ignored for the purpose of evaluating dag run state. + * A teardown task will run if it's setup was successful, even if its work tasks failed. + * Teardown tasks are ignored when setting dependencies against task groups. + * A setup task must always have a teardown and vice versa. You may use EmptyOperator as a setup or teardown. + +Basic usage +""""""""""" + +Suppose you have a dag that creates a cluster, runs a query, and deletes the cluster. Without using setup and teardown tasks you might set these relationships: + +.. code-block:: python + + create_cluster >> run_query >> delete_cluster + +We can use the ``as_teardown`` let airflow know that ``create_cluster`` is a setup task and ``delete_cluster`` is its teardown: + +.. code-block:: python + + create_cluster >> run_query >> delete_cluster.as_teardown(setups=create_cluster) + +Observations: + + * If you clear ``run_query`` to run it again, then both ``create_cluster`` and ``delete_cluster`` will be cleared. + * If ``run_query`` fails, then ``delete_cluster`` will still run. + * The success of the dag run will depend on the success of ``run_query``. + +Setup "scope" +""""""""""""" + +We require that a setup always have a teardown in order to have a well-defined scope. If you wish to only add a teardown task or only a setup task, you may use EmptyOperator as your "empty setup" or "empty teardown". + +The "scope" of a setup will be determined by where the teardown is. Tasks between a setup and its teardown are in the "scope" of the setup / teardown pair. Example: + +.. code-block:: python + + s1 >> w1 >> w2 >> t1.as_teardown(setups=s1) >> w3 + w2 >> w4 + +In the above example, w1 and w2 are "between" s1 and t1 and therefore are assumed to require s1. Thus if w1 or w2 is cleared, so too will be s1 and t1. But if w3 or w4 is cleared, neither s1 nor t1 will be cleared. + +Controlling dag run state +""""""""""""""""""""""""" + +Another feature of setup / teardown tasks is you can choose whether or not the teardown task should have an impact on dag run state. Perhaps you don't care if the "cleanup" work performed by your teardown task fails, and you only consider the dag run a failure if the "work" tasks fail. By default, teardown tasks are not considered for dag run state. + +Continuing with the example above, if you want the run's success to depend on ``delete_cluster``. Then set property ``on_failure_fail_dagrun=True`` when setting ``delete_cluster`` as teardown: + +.. code-block:: python + + create_cluster >> run_query >> delete_cluster.as_teardown(setups=create_cluster, on_failure_fail_dagrun=True) + +Authoring with task groups +"""""""""""""""""""""""""" + +When arrowing from task group to task group, or from task group to task, we ignore teardowns. This allows teardowns to run in parallel, and allows dag execution to proceed even if teardown tasks fail. + +Consider this example: + +.. code-block:: python + + with TaskGroup("my_group") as tg: + s1 = my_setup() + w1 = my_work() + t1 = my_teardown() + s1 >> w1 >> t1.as_teardown(setups=s1) + w2 = other_work() + tg >> w2 + +If ``t1`` were not a teardown task, then this dag would effectively be ``s1 >> w1 >> t1 >> w2``. But since we have marked ``t1`` as a teardown, it's ignored in ``tg >> w2``. So the dag is equivalent to the following: + +.. code-block:: python + + s1 >> w1 >> [t1.as_teardown(setups=s1), w2] + +Now let's consider an example with nesting: + +.. code-block:: python + + with TaskGroup("my_group") as tg: + s1 = my_setup() + w1 = my_work() + t1 = my_teardown() + s1 >> w1 >> t1.as_teardown(setups=s1) + w2 = other_work() + tg >> w2 + dag_s1 = dag_setup1() + dag_t1 = dag_teardown1() + dag_s1 >> [tg, w2] >> dag_t1.as_teardown(dag_s1) + +In this example s1 is downstream of dag_s1, so it must wait for dag_s1 to complete successfully. But t1 and dag_t1 can run concurrently, because t1 is ignored in the expression ``tg >> dag_t1``. If you clear w1, it will clear dag_s1 and dag_t1, but not anything in the task group. + +Setup / teardown context manager Review Comment: ok sure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org