[airflow] 13/17: Improve Kubernetes Executor docs (#19339)

jedcunningham Wed, 03 Nov 2021 14:30:24 -0700

This is an automated email from the ASF dual-hosted git repository.

jedcunningham pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git


commit fbb7fbd30895eb6fac32a9a7dacbd904a9d348c9
Author: Daniel Standish <15932138+dstand...@users.noreply.github.com>
AuthorDate: Wed Nov 3 05:48:07 2021 -0700

    Improve Kubernetes Executor docs (#19339)
    
    (cherry picked from commit eace4102b68e4964b47f2d8c555f65ceaf0a3690)
---
 docs/apache-airflow/executor/kubernetes.rst | 166 +++++++++++++++++-----------
 docs/spelling_wordlist.txt                  |   1 +
 2 files changed, 102 insertions(+), 65 deletions(-)

diff --git a/docs/apache-airflow/executor/kubernetes.rst 
b/docs/apache-airflow/executor/kubernetes.rst
index 00a4b5a..923cf22 100644
--- a/docs/apache-airflow/executor/kubernetes.rst
+++ b/docs/apache-airflow/executor/kubernetes.rst
@@ -21,75 +21,91 @@
 Kubernetes Executor
 ===================
 
-The kubernetes executor is introduced in Apache Airflow 1.10.0. The Kubernetes 
executor will create a new pod for every task instance.
+The Kubernetes executor runs each task instance in its own pod on a Kubernetes 
cluster.
 
-Example kubernetes files are available at 
``scripts/in_container/kubernetes/app/{secrets,volumes,postgres}.yaml`` in the 
source distribution (please note that these examples are not ideal for 
production environments).
-The volumes are optional and depend on your configuration. There are two 
volumes available:
+KubernetesExecutor runs as a process in the Airflow Scheduler. The scheduler 
itself does
+not necessarily need to be running on Kubernetes, but does need access to a 
Kubernetes cluster.
 
-- **Dags**:
+KubernetesExecutor requires a non-sqlite database in the backend.
+
+When a DAG submits a task, the KubernetesExecutor requests a worker pod from 
the Kubernetes API. The worker pod then runs the task, reports the result, and 
terminates.
 
-  - By storing dags onto persistent disk, it will be made available to all 
workers
+.. image:: ../img/arch-diag-kubernetes.png
 
-  - Another option is to use ``git-sync``. Before starting the container, a 
git pull of the dags repository will be performed and used throughout the 
lifecycle of the pod
 
-- **Logs**:
+One example of an Airflow deployment running on a distributed set of five 
nodes in a Kubernetes cluster is shown below.
 
-  - By storing logs onto a persistent disk, the files are accessible by 
workers and the webserver. If you don't configure this, the logs will be lost 
after the worker pods shuts down
+.. image:: ../img/arch-diag-kubernetes2.png
 
-  - Another option is to use S3/GCS/etc to store logs
+Consistent with the regular Airflow architecture, the Workers need access to 
the DAG files to execute the tasks within those DAGs and interact with the 
Metadata repository. Also, configuration information specific to the Kubernetes 
Executor, such as the worker namespace and image information, needs to be 
specified in the Airflow Configuration file.
 
-To troubleshoot issue with KubernetesExecutor, you can use ``airflow 
kubernetes generate-dag-yaml`` command.
-This command generates the pods as they will be launched in Kubernetes and 
dumps them into yaml files for you to inspect.
+Additionally, the Kubernetes Executor enables specification of additional 
features on a per-task basis using the Executor config.
+
+.. @startuml
+.. Airflow_Scheduler -> Kubernetes: Request a new pod with command "airflow 
run..."
+.. Kubernetes -> Airflow_Worker: Create Airflow worker with command "airflow 
run..."
+.. Airflow_Worker -> Airflow_DB: Report task passing or failure to DB
+.. Airflow_Worker -> Kubernetes: Pod completes with state "Succeeded" and k8s 
records in ETCD
+.. Kubernetes -> Airflow_Scheduler: Airflow scheduler reads "Succeeded" from 
k8s watcher thread
+.. @enduml
+.. image:: ../img/k8s-happy-path.png
+
+Configuration
+-------------
 
 .. _concepts:pod_template_file:
 
 pod_template_file
-#################
+~~~~~~~~~~~~~~~~~
+
+To customize the pod used for k8s executor worker processes, you may create a 
pod template file. You must provide
+the path to the template file in the ``pod_template_file`` option in the 
``kubernetes`` section of ``airflow.cfg``.
+
+Airflow has two strict requirements for pod template files: base image and pod 
name.
+
+Base image
+^^^^^^^^^^
+
+A ``pod_template_file`` must have a container named ``base`` at the 
``spec.containers[0]`` position, and
+its ``image`` must be specified.
 
-As of Airflow 1.10.12, you can now use the ``pod_template_file`` option in the 
``kubernetes`` section
-of the ``airflow.cfg`` file to form the basis of your KubernetesExecutor pods. 
This process is faster to execute
-and easier to modify.
+You are free to create sidecar containers after this required container, but 
Airflow assumes that the
+airflow worker container exists at the beginning of the container array, and 
assumes that the
+container is named ``base``.
 
-We include multiple examples of working pod operators below, but we would also 
like to explain a few necessary components
-if you want to customize your template files. As long as you have these 
components, every other element
-in the template is customizable.
+.. note::
 
-1. Airflow will overwrite the base container image and the pod name
+    Airflow may override the base container ``image``, e.g. through 
:ref:`pod_override <concepts:pod_override>`
+    configuration; but it must be present in the template file and must not be 
blank.
 
-There are two points where Airflow potentially overwrites the base image: in 
the ``airflow.cfg``
-or the ``pod_override`` (discussed below) setting. This value is overwritten 
to ensure that users do
-not need to update multiple template files every time they upgrade their 
docker image. The other field
-that Airflow overwrites is the ``pod.metadata.name`` field. This field has to 
be unique across all pods,
-so we generate these names dynamically before launch.
+Pod name
+^^^^^^^^
 
-It's important to note while Airflow overwrites these fields, they **can not 
be left blank**.
-If these fields do not exist, kubernetes can not load the yaml into a 
Kubernetes V1Pod.
+The pod's ``metadata.name`` must be set in the template file.  This field will 
*always* be set dynamically at
+pod launch to guarantee uniqueness across all pods. But again, it must be 
included in the template, and cannot
+be left blank.
 
-2. Each Airflow ``pod_template_file`` must have a container named "base" at 
the ``pod.spec.containers[0]`` position
 
-Airflow uses the ``pod_template_file`` by making certain assumptions about the 
structure of the template.
-When airflow creates the worker pod's command, it assumes that the airflow 
worker container part exists
-at the beginning of the container array. It then assumes that the container is 
named ``base``
-when it merges this pod with internal configs. You are more than welcome to 
create
-sidecar containers after this required container.
+Example pod templates
+~~~~~~~~~~~~~~~~~~~~~
 
 With these requirements in mind, here are some examples of basic 
``pod_template_file`` YAML files.
 
-pod_template_file using the ``dag_in_image`` setting:
+Storing DAGs in the image:
 
 .. exampleinclude:: 
/../../airflow/kubernetes/pod_template_file_examples/dags_in_image_template.yaml
     :language: yaml
     :start-after: [START template_with_dags_in_image]
     :end-before: [END template_with_dags_in_image]
 
-``pod_template_file`` which stores DAGs in a ``persistentVolume``:
+Storing DAGs in a ``persistentVolume``:
 
 .. exampleinclude:: 
/../../airflow/kubernetes/pod_template_file_examples/dags_in_volume_template.yaml
     :language: yaml
     :start-after: [START template_with_dags_in_volume]
     :end-before: [END template_with_dags_in_volume]
 
-``pod_template_file`` which pulls DAGs from git:
+Pulling DAGs from ``git``:
 
 .. exampleinclude:: 
/../../airflow/kubernetes/pod_template_file_examples/git_sync_template.yaml
     :language: yaml
@@ -99,7 +115,7 @@ pod_template_file using the ``dag_in_image`` setting:
 .. _concepts:pod_override:
 
 pod_override
-############
+~~~~~~~~~~~~
 
 When using the KubernetesExecutor, Airflow offers the ability to override 
system defaults on a per-task basis.
 To utilize this functionality, create a Kubernetes V1pod object and fill in 
your desired overrides.
@@ -135,49 +151,70 @@ Here is an example of a task with both features:
     :start-after: [START task_with_template]
     :end-before: [END task_with_template]
 
-KubernetesExecutor Architecture
-################################
+Managing dags and logs
+~~~~~~~~~~~~~~~~~~~~~~
 
-The KubernetesExecutor runs as a process in the Scheduler that only requires 
access to the Kubernetes API (it does *not* need to run inside of a Kubernetes 
cluster). The KubernetesExecutor requires a non-sqlite database in the backend, 
but there are no external brokers or persistent workers needed.
-For these reasons, we recommend the KubernetesExecutor for deployments have 
long periods of dormancy between DAG execution.
+Use of persistent volumes is optional and depends on your configuration.
 
-When a DAG submits a task, the KubernetesExecutor requests a worker pod from 
the Kubernetes API. The worker pod then runs the task, reports the result, and 
terminates.
+- **Dags**:
 
+To get the DAGs into the workers, you can:
 
-.. image:: ../img/arch-diag-kubernetes.png
+  - Include dags in the image.
+  - Use ``git-sync`` which, before starting the worker container, will run a 
``git pull`` of the dags repository.
+  - Storing dags on a persistent volume, which can be mounted on all workers.
 
+- **Logs**:
 
-In contrast to the Celery Executor, the Kubernetes Executor does not require 
additional components such as Redis and Flower, but does require the Kubernetes 
infrastructure.
+To get task logs out of the workers, you can:
 
-One example of an Airflow deployment running on a distributed set of five 
nodes in a Kubernetes cluster is shown below.
+  - Use a persistent volume mounted on both the webserver and workers.
 
-.. image:: ../img/arch-diag-kubernetes2.png
+  - Enable remote logging.
 
-The Kubernetes Executor has an advantage over the Celery Executor in that Pods 
are only spun up when required for task execution compared to the Celery 
Executor where the workers are statically configured and are running all the 
time, regardless of workloads. However, this could be a disadvantage depending 
on the latency needs, since a task takes longer to start using the Kubernetes 
Executor, since it now includes the Pod startup time.
+.. note::
 
-Consistent with the regular Airflow architecture, the Workers need access to 
the DAG files to execute the tasks within those DAGs and interact with the 
Metadata repository. Also, configuration information specific to the Kubernetes 
Executor, such as the worker namespace and image information, needs to be 
specified in the Airflow Configuration file.
+    If you don't enable logging persistence, and if you have not enabled 
remote logging, logs will be lost after the worker pods shut down.
 
-Additionally, the Kubernetes Executor enables specification of additional 
features on a per-task basis using the Executor config.
 
+Comparison with CeleryExecutor
+------------------------------
 
+In contrast to CeleryExecutor, KubernetesExecutor does not require additional 
components such as Redis and Flower, but does require access to Kubernetes 
cluster.
 
-.. @startuml
-.. Airflow_Scheduler -> Kubernetes: Request a new pod with command "airflow 
run..."
-.. Kubernetes -> Airflow_Worker: Create Airflow worker with command "airflow 
run..."
-.. Airflow_Worker -> Airflow_DB: Report task passing or failure to DB
-.. Airflow_Worker -> Kubernetes: Pod completes with state "Succeeded" and k8s 
records in ETCD
-.. Kubernetes -> Airflow_Scheduler: Airflow scheduler reads "Succeeded" from 
k8s watcher thread
-.. @enduml
-.. image:: ../img/k8s-happy-path.png
+With KubernetesExecutor, each task runs in its own pod. The pod is created 
when the task is queued, and terminates when the task completes.
+Historically, in scenarios such as burstable workloads, this presented a 
resource utilization advantage over CeleryExecutor, where you needed
+a fixed number of long-running celery worker pods, whether or not there were 
tasks to run.
+
+However, the :doc:`official Apache Airflow Helm chart <helm-chart:index>` can 
automatically scale celery workers down to zero based on the number of tasks in 
the queue,
+so when using the official chart, this is no longer an advantage.
 
+With Celery workers you will tend to have less task latency because the worker 
pod is already up and running when the task is queued. On the
+other hand, because multiple tasks are running in the same pod, with Celery 
you may have to be more mindful about resource utilization
+in your task design, particularly memory consumption.
+
+One scenario where KubernetesExecutor can be helpful is if you have 
long-running tasks, because if you deploy while a task is running,
+the task will keep running until it completes (or times out, etc). But with 
CeleryExecutor, provided you have set a grace period, the
+task will only keep running up until the grace period has elapsed, at which 
time the task will be terminated.  Another scenario where
+KubernetesExecutor can work well is when your tasks are not very uniform with 
respect to resource requirements or images.
+
+Finally, note that it does not have to be either-or; with 
CeleryKubernetesExecutor, it is possible to use both CeleryExecutor and
+KubernetesExecutor simultaneously on the same cluster. 
CeleryKubernetesExecutor will look at a task's ``queue`` to determine
+whether to run on Celery or Kubernetes.  By default, tasks are sent to Celery 
workers, but if you want a task to run using KubernetesExecutor,
+you send it to the  ``kubernetes`` queue and it will run in its own pod.  And 
KubernetesPodOperator can be used
+to similar effect, no matter what executor you are using.
 
-***************
 Fault Tolerance
-***************
+---------------
+
+.. tip::
+
+    To troubleshoot issues with KubernetesExecutor, you can use ``airflow 
kubernetes generate-dag-yaml`` command.
+    This command generates the pods as they will be launched in Kubernetes and 
dumps them into yaml files for you to inspect.
+
 
-===========================
 Handling Worker Pod Crashes
-===========================
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 When dealing with distributed systems, we need a system that assumes that any 
component can crash at any moment for reasons ranging from OOM errors to node 
upgrades.
 
@@ -201,13 +238,12 @@ A Kubernetes watcher is a thread that can subscribe to 
every change that occurs
 By monitoring this stream, the KubernetesExecutor can discover that the worker 
crashed and correctly report the task as failed.
 
 
-=====================================================
 But What About Cases Where the Scheduler Pod Crashes?
-=====================================================
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-In cases of scheduler crashes, we can completely rebuild the state of the 
scheduler using the watcher's ``resourceVersion``.
+In cases of scheduler crashes, the scheduler will recover its state using the 
watcher's ``resourceVersion``.
 
-When monitoring the Kubernetes cluster's watcher thread, each event has a 
monotonically rising number called a resourceVersion.
-Every time the executor reads a resourceVersion, the executor stores the 
latest value in the backend database.
+When monitoring the Kubernetes cluster's watcher thread, each event has a 
monotonically rising number called a ``resourceVersion``.
+Every time the executor reads a ``resourceVersion``, the executor stores the 
latest value in the backend database.
 Because the resourceVersion is stored, the scheduler can restart and continue 
reading the watcher stream from where it left off.
 Since the tasks are run independently of the executor and report results 
directly to the database, scheduler failures will not lead to task failures or 
re-runs.
diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt
index 39b88c7..c0efd04 100644
--- a/docs/spelling_wordlist.txt
+++ b/docs/spelling_wordlist.txt
@@ -505,6 +505,7 @@ bq
 bugfix
 bugfixes
 buildType
+burstable
 bytestring
 cacert
 callables

[airflow] 13/17: Improve Kubernetes Executor docs (#19339)

Reply via email to