jedcunningham commented on code in PR #62544:
URL: https://github.com/apache/airflow/pull/62544#discussion_r2874684410
##########
chart/docs/manage-logs.rst:
##########
@@ -58,25 +63,28 @@ for details.
.. code-block:: bash
- helm upgrade --install airflow apache-airflow/airflow \
- --set logs.persistence.enabled=true
- # you can also override the other persistence
- # by setting the logs.persistence.* values
- # Please refer to values.yaml for details
+ helm upgrade --install airflow apache-airflow/airflow \
+ --set logs.persistence.enabled=true
+ # You can also override the other persistence
+ # by setting the logs.persistence.* values
+ # Please refer to values.yaml for details
Externally provisioned PVC
--------------------------
-In this approach, Airflow will log to an existing ``ReadWriteMany`` PVC. You
pass in the name of the volume claim to the chart.
+In this approach, Airflow will log to an existing ``ReadWriteMany`` PVC. You
pass in the name of the volume claim to the chart
+by using the ``logs.persistence.existingClaim`` parameter:
.. code-block:: bash
- helm upgrade --install airflow apache-airflow/airflow \
- --set logs.persistence.enabled=true \
- --set logs.persistence.existingClaim=my-volume-claim
+ helm upgrade --install airflow apache-airflow/airflow \
+ --set logs.persistence.enabled=true \
+ --set logs.persistence.existingClaim=my-volume-claim
+
+.. note::
-Note that the volume will need to be writable by the Airflow user. The easiest
way is to ensure GID ``0`` has write permission.
-More information can be found in the :ref:`Docker image entrypoint
documentation <docker-stack:arbitrary-docker-user>`.
+ The volume have to be writable by the Airflow user. The easiest way is to
ensure GID ``0`` has a write permission.
Review Comment:
```suggestion
The volume has to be writable by the Airflow user. The easiest way is to
ensure GID ``0`` has write permission.
```
##########
chart/docs/production-guide.rst:
##########
@@ -51,243 +52,260 @@ Values file
This is the simpler options, as the chart will create a Kubernetes Secret for
you. However, keep in mind your credentials will be in your values file.
.. code-block:: yaml
+ :caption: values.yaml
- data:
- metadataConnection:
- user: <username>
- pass: <password>
- protocol: postgresql
- host: <hostname>
- port: 5432
- db: <database name>
+ data:
+ metadataConnection:
+ user: <username>
+ pass: <password>
+ protocol: postgresql
+ host: <hostname>
+ port: 5432
+ db: <database name>
+.. warning::
+
+ Due to security concerns, it is not advised to store Airflow database user
credentials directly in the ``values.yaml`` file.
Kubernetes Secret
^^^^^^^^^^^^^^^^^
-You can also store the credentials in a Kubernetes Secret you create. Note that
-special characters in the username/password must be URL encoded.
+You can store the credentials in a Kubernetes Secret (it requires manual
creation).
+
+.. note::
+
+ Any special character in the username/password must be URL encoded.
.. code-block:: bash
- kubectl create secret generic mydatabase
--from-literal=connection=postgresql://user:pass@host:5432/db
+ kubectl create secret generic mydatabase
--from-literal=connection=postgresql://user:pass@host:5432/db
-Finally, configure the chart to use the secret you created:
+After secret creation, configure the chart to use the secret:
.. code-block:: yaml
+ :caption: values.yaml
- data:
- metadataSecretName: mydatabase
+ data:
+ metadataSecretName: mydatabase
.. _production-guide:pgbouncer:
-Metadata DB cleanup
+Metadata DB Cleanup
^^^^^^^^^^^^^^^^^^^
-It is recommended to periodically clean up the Airflow metadata database to
remove old records and keep the database size manageable. A Kubernetes CronJob
can be enabled for this purpose:
+It is recommended to periodically clean up the Airflow metadata database to
remove old records and keep the database size manageable.
+A Kubernetes CronJob can be enabled for this purpose:
.. code-block:: yaml
+ :caption: values.yaml
- databaseCleanup:
- enabled: true
- retentionDays: 90
+ databaseCleanup:
+ enabled: true
+ retentionDays: 90
-Several additional options can be configured and passed to the ``airflow db
clean`` command:
-
-+-------------------------------------------------------+------------------------------------------+
-| Helm chart value | ``airflow db clean``
option |
-+=======================================================+==========================================+
-| ``.Values.databaseCleanup.skipArchive`` | ``--skip-archive``
|
-+-------------------------------------------------------+------------------------------------------+
-| ``.Values.databaseCleanup.tables`` | ``--tables``
|
-+-------------------------------------------------------+------------------------------------------+
-| ``.Values.databaseCleanup.batchSize`` | ``--batch-size``
|
-+-------------------------------------------------------+------------------------------------------+
-| ``.Values.databaseCleanup.verbose`` | ``--verbose``
|
-+-------------------------------------------------------+------------------------------------------+
-
-See :ref:`db clean usage <cli-db-clean>` for more details.
+For details regarding the ``airflow db clean`` command, see :ref:`db clean
usage <cli-db-clean>` and for additional options which
+can be configured via helm chart values, see :doc:`parameters reference
<parameters-ref>`.
PgBouncer
---------
If you are using PostgreSQL as your database, you will likely want to enable
`PgBouncer <https://www.pgbouncer.org/>`_ as well.
-Airflow can open a lot of database connections due to its distributed nature
and using a connection pooler can significantly
+Due to distributed nature of Airflow, it can open a lot of database
connections. Using a connection pooler can significantly
reduce the number of open connections on the database.
Database credentials stored Values file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
+ :caption: values.yaml
- pgbouncer:
- enabled: true
+ pgbouncer:
+ enabled: true
Database credentials stored Kubernetes Secret
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The default connection string in this case will not work you need to modify
accordingly
+The default connection string in this case will not work. You need to modify
accordingly the Kubernetes secret:
.. code-block:: bash
- kubectl create secret generic mydatabase
--from-literal=connection=postgresql://user:pass@pgbouncer_svc_name.deployment_namespace:6543/airflow-metadata
+ kubectl create secret generic mydatabase
--from-literal=connection=postgresql://user:pass@pgbouncer_svc_name.deployment_namespace:6543/airflow-metadata
-Two additional Kubernetes Secret required to PgBouncer able to properly work
in this configuration:
+Furthermore, two additional Kubernetes Secret are required for PgBouncer to be
able to properly work:
-``airflow-pgbouncer-stats``
+1. ``airflow-pgbouncer-stats`` secret:
-.. code-block:: bash
+ .. code-block:: bash
- kubectl create secret generic airflow-pgbouncer-stats
--from-literal=connection=postgresql://user:[email protected]:6543/pgbouncer?sslmode=disable
+ kubectl create secret generic airflow-pgbouncer-stats
--from-literal=connection=postgresql://user:[email protected]:6543/pgbouncer?sslmode=disable
-``airflow-pgbouncer-config``
+2. ``airflow-pgbouncer-config`` secret:
-.. code-block:: yaml
+ .. code-block:: yaml
+ :caption: airflow-pgbouncer-config
- apiVersion: v1
- kind: Secret
- metadata:
- name: airflow-pgbouncer-config
- data:
- pgbouncer.ini: dmFsdWUtMg0KDQo=
- users.txt: dmFsdWUtMg0KDQo=
+ apiVersion: v1
+ kind: Secret
+ metadata:
+ name: airflow-pgbouncer-config
+ data:
+ pgbouncer.ini: dmFsdWUtMg0KDQo=
+ users.txt: dmFsdWUtMg0KDQo=
+ where:
-``pgbouncer.ini`` equal to the base64 encoded version of this text
+ 1. ``pgbouncer.ini`` value is equal to the base64 encoded version of below
text:
-.. code-block:: text
+ .. code-block:: text
+ :caption: pgbouncer.ini
- [databases]
- airflow-metadata = host={external_database_host}
dbname={external_database_dbname} port=5432 pool_size=10
+ [databases]
+ airflow-metadata = host={external_database_host}
dbname={external_database_dbname} port=5432 pool_size=10
- [pgbouncer]
- pool_mode = transaction
- listen_port = 6543
- listen_addr = *
- auth_type = scram-sha-256
- auth_file = /etc/pgbouncer/users.txt
- stats_users = postgres
- ignore_startup_parameters = extra_float_digits
- max_client_conn = 100
- verbose = 0
- log_disconnections = 0
- log_connections = 0
+ [pgbouncer]
+ pool_mode = transaction
+ listen_port = 6543
+ listen_addr = *
+ auth_type = scram-sha-256
+ auth_file = /etc/pgbouncer/users.txt
+ stats_users = postgres
+ ignore_startup_parameters = extra_float_digits
+ max_client_conn = 100
+ verbose = 0
+ log_disconnections = 0
+ log_connections = 0
- server_tls_sslmode = prefer
- server_tls_ciphers = normal
+ server_tls_sslmode = prefer
+ server_tls_ciphers = normal
-``users.txt`` equal to the base64 encoded version of this text
+ 2. ``users.txt`` value is equal to the base64 encoded version of below text:
-.. code-block:: text
+ .. code-block:: text
+ :caption: users.txt
- "{ external_database_username }" "{ external_database_pass }"
+ "{ external_database_username }" "{ external_database_pass }"
-The ``values.yaml`` should looks like this
+In the ``values.yaml`` below secret-related parameters should be adjusted like:
.. code-block:: yaml
+ :caption: values.yaml
- pgbouncer:
- enabled: true
- configSecretName: airflow-pgbouncer-config
- metricsExporterSidecar:
- statsSecretName: airflow-pgbouncer-stats
+ pgbouncer:
+ enabled: true
+ configSecretName: airflow-pgbouncer-config
+ metricsExporterSidecar:
+ statsSecretName: airflow-pgbouncer-stats
+.. note::
-Depending on the size of your Airflow instance, you may want to adjust the
following as well (defaults are shown):
+ Depending on the size of your Airflow instance, you may want to adjust the
following as well (defaults are shown):
-.. code-block:: yaml
+ .. code-block:: yaml
+ :caption: values.yaml
- pgbouncer:
- # The maximum number of connections to PgBouncer
- maxClientConn: 100
- # The maximum number of server connections to the metadata database from
PgBouncer
- metadataPoolSize: 10
- # The maximum number of server connections to the result backend database
from PgBouncer
- resultBackendPoolSize: 5
+ pgbouncer:
+ # The maximum number of connections to PgBouncer
+ maxClientConn: 100
+ # The maximum number of server connections to the metadata database
from PgBouncer
+ metadataPoolSize: 10
+ # The maximum number of server connections to the result backend
database from PgBouncer
+ resultBackendPoolSize: 5
API Secret Key
----------------
+--------------
-You should set a static API secret key when deploying with this chart as it
will help ensure
-your Airflow components only restart when necessary.
+You should set a static API secret key when deploying with Airflow chart as it
will help ensure
+your Airflow components only restarts when necessary.
.. note::
- This section also applies to the webserver for Airflow <3 -- simply replace
"API" with "webserver."
+
+ This section also applies to the webserver for Airflow (simply replace
``api`` with ``webserver``).
Review Comment:
```suggestion
This section also applies to the webserver for Airflow 2 (simply replace
``api`` with ``webserver``).
```
Lets leave this, but 2 sounds better than <3.
##########
chart/docs/production-guide.rst:
##########
@@ -51,243 +52,260 @@ Values file
This is the simpler options, as the chart will create a Kubernetes Secret for
you. However, keep in mind your credentials will be in your values file.
.. code-block:: yaml
+ :caption: values.yaml
- data:
- metadataConnection:
- user: <username>
- pass: <password>
- protocol: postgresql
- host: <hostname>
- port: 5432
- db: <database name>
+ data:
+ metadataConnection:
+ user: <username>
+ pass: <password>
+ protocol: postgresql
+ host: <hostname>
+ port: 5432
+ db: <database name>
+.. warning::
+
+ Due to security concerns, it is not advised to store Airflow database user
credentials directly in the ``values.yaml`` file.
Kubernetes Secret
^^^^^^^^^^^^^^^^^
-You can also store the credentials in a Kubernetes Secret you create. Note that
-special characters in the username/password must be URL encoded.
+You can store the credentials in a Kubernetes Secret (it requires manual
creation).
+
+.. note::
+
+ Any special character in the username/password must be URL encoded.
.. code-block:: bash
- kubectl create secret generic mydatabase
--from-literal=connection=postgresql://user:pass@host:5432/db
+ kubectl create secret generic mydatabase
--from-literal=connection=postgresql://user:pass@host:5432/db
-Finally, configure the chart to use the secret you created:
+After secret creation, configure the chart to use the secret:
.. code-block:: yaml
+ :caption: values.yaml
- data:
- metadataSecretName: mydatabase
+ data:
+ metadataSecretName: mydatabase
.. _production-guide:pgbouncer:
-Metadata DB cleanup
+Metadata DB Cleanup
^^^^^^^^^^^^^^^^^^^
-It is recommended to periodically clean up the Airflow metadata database to
remove old records and keep the database size manageable. A Kubernetes CronJob
can be enabled for this purpose:
+It is recommended to periodically clean up the Airflow metadata database to
remove old records and keep the database size manageable.
+A Kubernetes CronJob can be enabled for this purpose:
.. code-block:: yaml
+ :caption: values.yaml
- databaseCleanup:
- enabled: true
- retentionDays: 90
+ databaseCleanup:
+ enabled: true
+ retentionDays: 90
-Several additional options can be configured and passed to the ``airflow db
clean`` command:
-
-+-------------------------------------------------------+------------------------------------------+
-| Helm chart value | ``airflow db clean``
option |
-+=======================================================+==========================================+
-| ``.Values.databaseCleanup.skipArchive`` | ``--skip-archive``
|
-+-------------------------------------------------------+------------------------------------------+
-| ``.Values.databaseCleanup.tables`` | ``--tables``
|
-+-------------------------------------------------------+------------------------------------------+
-| ``.Values.databaseCleanup.batchSize`` | ``--batch-size``
|
-+-------------------------------------------------------+------------------------------------------+
-| ``.Values.databaseCleanup.verbose`` | ``--verbose``
|
-+-------------------------------------------------------+------------------------------------------+
-
-See :ref:`db clean usage <cli-db-clean>` for more details.
+For details regarding the ``airflow db clean`` command, see :ref:`db clean
usage <cli-db-clean>` and for additional options which
+can be configured via helm chart values, see :doc:`parameters reference
<parameters-ref>`.
PgBouncer
---------
If you are using PostgreSQL as your database, you will likely want to enable
`PgBouncer <https://www.pgbouncer.org/>`_ as well.
-Airflow can open a lot of database connections due to its distributed nature
and using a connection pooler can significantly
+Due to distributed nature of Airflow, it can open a lot of database
connections. Using a connection pooler can significantly
reduce the number of open connections on the database.
Database credentials stored Values file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
+ :caption: values.yaml
- pgbouncer:
- enabled: true
+ pgbouncer:
+ enabled: true
Database credentials stored Kubernetes Secret
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The default connection string in this case will not work you need to modify
accordingly
+The default connection string in this case will not work. You need to modify
accordingly the Kubernetes secret:
.. code-block:: bash
- kubectl create secret generic mydatabase
--from-literal=connection=postgresql://user:pass@pgbouncer_svc_name.deployment_namespace:6543/airflow-metadata
+ kubectl create secret generic mydatabase
--from-literal=connection=postgresql://user:pass@pgbouncer_svc_name.deployment_namespace:6543/airflow-metadata
-Two additional Kubernetes Secret required to PgBouncer able to properly work
in this configuration:
+Furthermore, two additional Kubernetes Secret are required for PgBouncer to be
able to properly work:
-``airflow-pgbouncer-stats``
+1. ``airflow-pgbouncer-stats`` secret:
-.. code-block:: bash
+ .. code-block:: bash
- kubectl create secret generic airflow-pgbouncer-stats
--from-literal=connection=postgresql://user:[email protected]:6543/pgbouncer?sslmode=disable
+ kubectl create secret generic airflow-pgbouncer-stats
--from-literal=connection=postgresql://user:[email protected]:6543/pgbouncer?sslmode=disable
-``airflow-pgbouncer-config``
+2. ``airflow-pgbouncer-config`` secret:
-.. code-block:: yaml
+ .. code-block:: yaml
+ :caption: airflow-pgbouncer-config
- apiVersion: v1
- kind: Secret
- metadata:
- name: airflow-pgbouncer-config
- data:
- pgbouncer.ini: dmFsdWUtMg0KDQo=
- users.txt: dmFsdWUtMg0KDQo=
+ apiVersion: v1
+ kind: Secret
+ metadata:
+ name: airflow-pgbouncer-config
+ data:
+ pgbouncer.ini: dmFsdWUtMg0KDQo=
+ users.txt: dmFsdWUtMg0KDQo=
+ where:
-``pgbouncer.ini`` equal to the base64 encoded version of this text
+ 1. ``pgbouncer.ini`` value is equal to the base64 encoded version of below
text:
-.. code-block:: text
+ .. code-block:: text
+ :caption: pgbouncer.ini
- [databases]
- airflow-metadata = host={external_database_host}
dbname={external_database_dbname} port=5432 pool_size=10
+ [databases]
+ airflow-metadata = host={external_database_host}
dbname={external_database_dbname} port=5432 pool_size=10
- [pgbouncer]
- pool_mode = transaction
- listen_port = 6543
- listen_addr = *
- auth_type = scram-sha-256
- auth_file = /etc/pgbouncer/users.txt
- stats_users = postgres
- ignore_startup_parameters = extra_float_digits
- max_client_conn = 100
- verbose = 0
- log_disconnections = 0
- log_connections = 0
+ [pgbouncer]
+ pool_mode = transaction
+ listen_port = 6543
+ listen_addr = *
+ auth_type = scram-sha-256
+ auth_file = /etc/pgbouncer/users.txt
+ stats_users = postgres
+ ignore_startup_parameters = extra_float_digits
+ max_client_conn = 100
+ verbose = 0
+ log_disconnections = 0
+ log_connections = 0
- server_tls_sslmode = prefer
- server_tls_ciphers = normal
+ server_tls_sslmode = prefer
+ server_tls_ciphers = normal
-``users.txt`` equal to the base64 encoded version of this text
+ 2. ``users.txt`` value is equal to the base64 encoded version of below text:
-.. code-block:: text
+ .. code-block:: text
+ :caption: users.txt
- "{ external_database_username }" "{ external_database_pass }"
+ "{ external_database_username }" "{ external_database_pass }"
-The ``values.yaml`` should looks like this
+In the ``values.yaml`` below secret-related parameters should be adjusted like:
.. code-block:: yaml
+ :caption: values.yaml
- pgbouncer:
- enabled: true
- configSecretName: airflow-pgbouncer-config
- metricsExporterSidecar:
- statsSecretName: airflow-pgbouncer-stats
+ pgbouncer:
+ enabled: true
+ configSecretName: airflow-pgbouncer-config
+ metricsExporterSidecar:
+ statsSecretName: airflow-pgbouncer-stats
+.. note::
-Depending on the size of your Airflow instance, you may want to adjust the
following as well (defaults are shown):
+ Depending on the size of your Airflow instance, you may want to adjust the
following as well (defaults are shown):
-.. code-block:: yaml
+ .. code-block:: yaml
+ :caption: values.yaml
- pgbouncer:
- # The maximum number of connections to PgBouncer
- maxClientConn: 100
- # The maximum number of server connections to the metadata database from
PgBouncer
- metadataPoolSize: 10
- # The maximum number of server connections to the result backend database
from PgBouncer
- resultBackendPoolSize: 5
+ pgbouncer:
+ # The maximum number of connections to PgBouncer
+ maxClientConn: 100
+ # The maximum number of server connections to the metadata database
from PgBouncer
+ metadataPoolSize: 10
+ # The maximum number of server connections to the result backend
database from PgBouncer
+ resultBackendPoolSize: 5
API Secret Key
----------------
+--------------
-You should set a static API secret key when deploying with this chart as it
will help ensure
-your Airflow components only restart when necessary.
+You should set a static API secret key when deploying with Airflow chart as it
will help ensure
+your Airflow components only restarts when necessary.
Review Comment:
```suggestion
your Airflow components only restart when necessary.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]