This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
     new 2429d077d8 Trigger gevent monkeypatching via environment variable 
(#28283)
2429d077d8 is described below

commit 2429d077d8c59299487562c8867cfc63cd969b9d
Author: Jarek Potiuk <[email protected]>
AuthorDate: Wed Dec 21 20:13:58 2022 +0100

    Trigger gevent monkeypatching via environment variable (#28283)
    
    Gevent needs to monkeypatch a number of system libraries as soon
    as possible when Python interpreter starts, in order to avoid
    other libraries monkey-patching them before. We should do it before
    any other initialization and it needs to be only run on webserver.
    
    So far it was done by local_settings monkeypatching but that has
    been rather brittle and some changes in Airflow made previous attempts
    to stop working because the "other" packages could be loaded by
    Airflow before - depending on installed providers and configuration
    (for example when you had AWS configured as logger, boto could have
    been loaded before and it could have monkey patch networking before
    gevent had a chance to do so.
    
    This change introduces different mechanism of triggering the
    patching - it could be triggered by setting an environment variable.
    This has the benefit that we do not need to initialize anything
    (including reading settings or setting up logging) before we determine
    if gevent patching should be performed.
    
    It has also the drawback that the user will have to set the environment
    variable in their deployment manually. However this is a small price to
    pay if they will get a stable and future-proof gevent monkeypatching
    built-in in Airflow.
    
    Fixes: #8212
---
 airflow/__init__.py                          | 9 +++++++++
 airflow/config_templates/config.yml          | 4 +++-
 airflow/config_templates/default_airflow.cfg | 4 +++-
 newsfragments/08212.misc.rst                 | 1 +
 4 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/airflow/__init__.py b/airflow/__init__.py
index 38190dc8b8..1ecd10bb88 100644
--- a/airflow/__init__.py
+++ b/airflow/__init__.py
@@ -32,6 +32,14 @@ import os
 import sys
 from typing import Callable
 
+if os.environ.get("_AIRFLOW_PATCH_GEVENT"):
+    # If you are using gevents and start airflow webserver, you might want to 
run gevent monkeypatching
+    # as one of the first thing when Airflow is started. This allows gevent to 
patch networking and other
+    # system libraries to make them gevent-compatible before anything else 
patches them (for example boto)
+    from gevent.monkey import patch_all
+
+    patch_all()
+
 from airflow import settings
 
 __all__ = ["__version__", "login", "DAG", "PY36", "PY37", "PY38", "PY39", 
"PY310", "XComArg"]
@@ -41,6 +49,7 @@ __all__ = ["__version__", "login", "DAG", "PY36", "PY37", 
"PY38", "PY39", "PY310
 # lib.)
 __path__ = __import__("pkgutil").extend_path(__path__, __name__)  # type: 
ignore
 
+
 # Perform side-effects unless someone has explicitly opted out before import
 # WARNING: DO NOT USE THIS UNLESS YOU REALLY KNOW WHAT YOU'RE DOING.
 if not os.environ.get("_AIRFLOW__AS_LIBRARY", None):
diff --git a/airflow/config_templates/config.yml 
b/airflow/config_templates/config.yml
index c38ef0c3b4..af5f130135 100644
--- a/airflow/config_templates/config.yml
+++ b/airflow/config_templates/config.yml
@@ -1233,7 +1233,9 @@ webserver:
     worker_class:
       description: |
         The worker class gunicorn should use. Choices include
-        sync (default), eventlet, gevent
+        sync (default), eventlet, gevent. Note when using gevent you might 
also want to set the
+        "_AIRFLOW_PATCH_GEVENT" environment variable to "1" to make sure 
gevent patching is done as
+        early as possible.
       version_added: ~
       type: string
       example: ~
diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index 4bd2883563..8f704c378f 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -640,7 +640,9 @@ secret_key = {SECRET_KEY}
 workers = 4
 
 # The worker class gunicorn should use. Choices include
-# sync (default), eventlet, gevent
+# sync (default), eventlet, gevent. Note when using gevent you might also want 
to set the
+# "_AIRFLOW_PATCH_GEVENT" environment variable to "1" to make sure gevent 
patching is done as
+# early as possible.
 worker_class = sync
 
 # Log files for the gunicorn webserver. '-' means log to stderr.
diff --git a/newsfragments/08212.misc.rst b/newsfragments/08212.misc.rst
new file mode 100644
index 0000000000..acce074f10
--- /dev/null
+++ b/newsfragments/08212.misc.rst
@@ -0,0 +1 @@
+If you are using gevent for your webserver deployment and used local settings 
to monkeypatch gevent, you might want to replace local settings patching with 
an ``_AIRFLOW_PATCH_GEVENT`` environment variable set to 1 in your webserver. 
This ensures gevent patching is done as early as possible.

Reply via email to