Lee-W commented on code in PR #64100:
URL: https://github.com/apache/airflow/pull/64100#discussion_r2979431833


##########
airflow-core/docs/faq.rst:
##########
@@ -237,6 +237,166 @@ There are several reasons why Dags might disappear from 
the UI. Common causes in
 * **Time synchronization issues** - Ensure all nodes (database, schedulers, 
workers) use NTP with <1s clock drift.
 
 
+.. _faq:dag-version-inflation:
+
+Why does my Dag version keep increasing?
+-----------------------------------------
+
+Every time the Dag processor parses a Dag file, it serializes the Dag and 
compares the result with the
+version stored in the metadata database. If anything has changed, Airflow 
creates a new Dag version. This
+mechanism ensures that Dag runs use consistent code throughout their 
execution, even if the Dag file is
+updated mid-run.
+
+**Dag version inflation** occurs when the version number increases 
indefinitely without the Dag author
+making any intentional changes.
+
+What goes wrong
+"""""""""""""""
+
+When Dag versions increase without meaningful changes:
+
+* The metadata database accumulates unnecessary Dag version records, 
increasing storage and query overhead.
+* The UI shows a misleading history of Dag changes, making it harder to 
identify real modifications.
+* The scheduler and API server may consume more memory as they load and cache 
a growing number of Dag versions.
+
+Common causes
+"""""""""""""
+
+Version inflation is caused by using values that change at **parse time** — 
that is, every time the Dag
+processor evaluates the Dag file — as arguments to Dag or Task constructors. 
The most common patterns are:
+
+**1. Using ``datetime.now()`` or ``pendulum.now()`` as ``start_date``:**
+
+.. code-block:: python
+
+    from datetime import datetime
+
+    from airflow.sdk import DAG
+
+    # BAD: datetime.now() produces a different value on every parse
+    with DAG(
+        dag_id="bad_example",
+        start_date=datetime.now(),
+        schedule="@daily",
+    ):
+        ...

Review Comment:
   ```suggestion
       with DAG(
           dag_id="bad_example",
           # BAD: datetime.now() produces a different value on every parse
           start_date=datetime.now(),
           schedule="@daily",
       ):
           ...
   ```



##########
airflow-core/docs/faq.rst:
##########
@@ -237,6 +237,166 @@ There are several reasons why Dags might disappear from 
the UI. Common causes in
 * **Time synchronization issues** - Ensure all nodes (database, schedulers, 
workers) use NTP with <1s clock drift.
 
 
+.. _faq:dag-version-inflation:
+
+Why does my Dag version keep increasing?
+-----------------------------------------
+
+Every time the Dag processor parses a Dag file, it serializes the Dag and 
compares the result with the
+version stored in the metadata database. If anything has changed, Airflow 
creates a new Dag version. This
+mechanism ensures that Dag runs use consistent code throughout their 
execution, even if the Dag file is
+updated mid-run.
+
+**Dag version inflation** occurs when the version number increases 
indefinitely without the Dag author
+making any intentional changes.
+
+What goes wrong
+"""""""""""""""
+
+When Dag versions increase without meaningful changes:
+
+* The metadata database accumulates unnecessary Dag version records, 
increasing storage and query overhead.
+* The UI shows a misleading history of Dag changes, making it harder to 
identify real modifications.
+* The scheduler and API server may consume more memory as they load and cache 
a growing number of Dag versions.
+
+Common causes
+"""""""""""""
+
+Version inflation is caused by using values that change at **parse time** — 
that is, every time the Dag
+processor evaluates the Dag file — as arguments to Dag or Task constructors. 
The most common patterns are:
+
+**1. Using ``datetime.now()`` or ``pendulum.now()`` as ``start_date``:**
+
+.. code-block:: python
+
+    from datetime import datetime
+
+    from airflow.sdk import DAG
+
+    # BAD: datetime.now() produces a different value on every parse
+    with DAG(
+        dag_id="bad_example",
+        start_date=datetime.now(),
+        schedule="@daily",
+    ):
+        ...
+
+Every parse produces a different ``start_date``, so the serialized Dag is 
always different from the
+stored version.
+
+**2. Using random values in Dag or Task arguments:**
+
+.. code-block:: python
+
+    import random
+
+    from airflow.sdk import DAG
+    from airflow.providers.standard.operators.python import PythonOperator
+
+    with DAG(dag_id="bad_random", start_date="2024-01-01", schedule="@daily") 
as dag:
+        # BAD: random value changes every parse
+        PythonOperator(
+            task_id=f"task_{random.randint(1, 1000)}",
+            python_callable=lambda: None,
+        )

Review Comment:
   ```suggestion
       with DAG(dag_id="bad_random", start_date="2024-01-01", 
schedule="@daily") as dag:
           PythonOperator(
               # BAD: random value changes every parse
               task_id=f"task_{random.randint(1, 1000)}",
               python_callable=lambda: None,
           )
   ```



##########
airflow-core/docs/faq.rst:
##########
@@ -237,6 +237,166 @@ There are several reasons why Dags might disappear from 
the UI. Common causes in
 * **Time synchronization issues** - Ensure all nodes (database, schedulers, 
workers) use NTP with <1s clock drift.
 
 
+.. _faq:dag-version-inflation:
+
+Why does my Dag version keep increasing?
+-----------------------------------------
+
+Every time the Dag processor parses a Dag file, it serializes the Dag and 
compares the result with the
+version stored in the metadata database. If anything has changed, Airflow 
creates a new Dag version. This
+mechanism ensures that Dag runs use consistent code throughout their 
execution, even if the Dag file is
+updated mid-run.
+
+**Dag version inflation** occurs when the version number increases 
indefinitely without the Dag author
+making any intentional changes.
+
+What goes wrong
+"""""""""""""""
+
+When Dag versions increase without meaningful changes:
+
+* The metadata database accumulates unnecessary Dag version records, 
increasing storage and query overhead.
+* The UI shows a misleading history of Dag changes, making it harder to 
identify real modifications.
+* The scheduler and API server may consume more memory as they load and cache 
a growing number of Dag versions.
+
+Common causes
+"""""""""""""
+
+Version inflation is caused by using values that change at **parse time** — 
that is, every time the Dag
+processor evaluates the Dag file — as arguments to Dag or Task constructors. 
The most common patterns are:
+
+**1. Using ``datetime.now()`` or ``pendulum.now()`` as ``start_date``:**
+
+.. code-block:: python
+
+    from datetime import datetime
+
+    from airflow.sdk import DAG
+
+    # BAD: datetime.now() produces a different value on every parse
+    with DAG(
+        dag_id="bad_example",
+        start_date=datetime.now(),
+        schedule="@daily",
+    ):
+        ...
+
+Every parse produces a different ``start_date``, so the serialized Dag is 
always different from the
+stored version.
+
+**2. Using random values in Dag or Task arguments:**
+
+.. code-block:: python
+
+    import random
+
+    from airflow.sdk import DAG
+    from airflow.providers.standard.operators.python import PythonOperator
+
+    with DAG(dag_id="bad_random", start_date="2024-01-01", schedule="@daily") 
as dag:
+        # BAD: random value changes every parse
+        PythonOperator(
+            task_id=f"task_{random.randint(1, 1000)}",
+            python_callable=lambda: None,
+        )
+
+**3. Assigning runtime-varying values to variables used in constructors:**
+
+.. code-block:: python
+
+    from datetime import datetime
+
+    from airflow.sdk import DAG
+    from airflow.providers.standard.operators.python import PythonOperator
+
+    # BAD: the variable captures a parse-time value, then is passed to the DAG
+    default_args = {"start_date": datetime.now()}
+
+    with DAG(dag_id="bad_defaults", default_args=default_args, 
schedule="@daily") as dag:
+        PythonOperator(task_id="my_task", python_callable=lambda: None)
+
+Even though ``datetime.now()`` is not called directly inside the Dag 
constructor, it flows in through
+``default_args`` and still causes a different serialized Dag on every parse.
+
+**4. Using environment variables or file contents that change between parses:**
+
+.. code-block:: python
+
+    import os
+
+    from airflow.sdk import DAG
+    from airflow.providers.standard.operators.bash import BashOperator
+
+    with DAG(dag_id="bad_env", start_date="2024-01-01", schedule="@daily") as 
dag:
+        # BAD if BUILD_NUMBER changes on every deployment or parse
+        BashOperator(
+            task_id="echo_build",
+            bash_command=f"echo {os.environ.get('BUILD_NUMBER', 'unknown')}",
+        )
+
+How to avoid version inflation
+""""""""""""""""""""""""""""""
+
+* **Use fixed ``start_date`` values.** Always set ``start_date`` to a static 
``datetime`` literal:
+
+  .. code-block:: python
+
+      import datetime
+
+      from airflow.sdk import DAG
+
+      with DAG(
+          dag_id="good_example",
+          start_date=datetime.datetime(2024, 1, 1),
+          schedule="@daily",
+      ):
+          ...
+
+* **Keep all Dag and Task constructor arguments deterministic.** Arguments 
passed to Dag and Operator
+  constructors must produce the same value on every parse. Move any dynamic 
computation into the
+  ``execute()`` method or use Jinja templates, which are evaluated at task 
execution time rather than
+  parse time.
+
+* **Use Jinja templates for dynamic values:**
+
+  .. code-block:: python
+
+      from airflow.providers.standard.operators.bash import BashOperator
+
+      # GOOD: the template is resolved at execution time, not parse time
+      BashOperator(
+          task_id="echo_date",
+          bash_command="echo {{ ds }}",
+      )
+
+* **Use Airflow Variables with templates instead of top-level lookups:**
+
+  .. code-block:: python
+
+      from airflow.providers.standard.operators.bash import BashOperator
+
+      # GOOD: Variable is resolved at execution time via template
+      BashOperator(
+          task_id="echo_var",
+          bash_command="echo {{ var.value.my_variable }}",
+      )
+
+Dag version inflation detection
+""""""""""""""""""""""""""""""""
+
+Starting from Airflow 3.2, the Dag processor performs **AST-based static 
analysis** on every Dag file
+before parsing to detect runtime-varying values in Dag and Task constructors. 
When a potential issue is
+found, it is surfaced as a **Dag warning** visible in the UI.
+
+You can control this behavior with the
+:ref:`dag_version_inflation_check_level 
<config:dag_processor__dag_version_inflation_check_level>`
+configuration option:
+
+* ``off`` — Disables the check entirely. No errors or warnings are generated.
+* ``warning`` (default) — Dags load normally but warnings are displayed in the 
UI when issues are detected.
+* ``error`` — Treats detected issues as Dag import errors, preventing the Dag 
from loading.

Review Comment:
   Probably could mention the ruff rule 
https://docs.astral.sh/ruff/rules/airflow3-dag-dynamic-value/ here?



##########
airflow-core/docs/faq.rst:
##########
@@ -237,6 +237,166 @@ There are several reasons why Dags might disappear from 
the UI. Common causes in
 * **Time synchronization issues** - Ensure all nodes (database, schedulers, 
workers) use NTP with <1s clock drift.
 
 
+.. _faq:dag-version-inflation:
+
+Why does my Dag version keep increasing?
+-----------------------------------------
+
+Every time the Dag processor parses a Dag file, it serializes the Dag and 
compares the result with the
+version stored in the metadata database. If anything has changed, Airflow 
creates a new Dag version. This
+mechanism ensures that Dag runs use consistent code throughout their 
execution, even if the Dag file is
+updated mid-run.
+
+**Dag version inflation** occurs when the version number increases 
indefinitely without the Dag author
+making any intentional changes.
+
+What goes wrong
+"""""""""""""""
+
+When Dag versions increase without meaningful changes:
+
+* The metadata database accumulates unnecessary Dag version records, 
increasing storage and query overhead.
+* The UI shows a misleading history of Dag changes, making it harder to 
identify real modifications.
+* The scheduler and API server may consume more memory as they load and cache 
a growing number of Dag versions.
+
+Common causes
+"""""""""""""
+
+Version inflation is caused by using values that change at **parse time** — 
that is, every time the Dag
+processor evaluates the Dag file — as arguments to Dag or Task constructors. 
The most common patterns are:
+
+**1. Using ``datetime.now()`` or ``pendulum.now()`` as ``start_date``:**
+
+.. code-block:: python
+
+    from datetime import datetime
+
+    from airflow.sdk import DAG
+
+    # BAD: datetime.now() produces a different value on every parse
+    with DAG(
+        dag_id="bad_example",
+        start_date=datetime.now(),
+        schedule="@daily",
+    ):
+        ...
+
+Every parse produces a different ``start_date``, so the serialized Dag is 
always different from the
+stored version.
+
+**2. Using random values in Dag or Task arguments:**
+
+.. code-block:: python
+
+    import random
+
+    from airflow.sdk import DAG
+    from airflow.providers.standard.operators.python import PythonOperator
+
+    with DAG(dag_id="bad_random", start_date="2024-01-01", schedule="@daily") 
as dag:
+        # BAD: random value changes every parse
+        PythonOperator(
+            task_id=f"task_{random.randint(1, 1000)}",
+            python_callable=lambda: None,
+        )
+
+**3. Assigning runtime-varying values to variables used in constructors:**
+
+.. code-block:: python
+
+    from datetime import datetime
+
+    from airflow.sdk import DAG
+    from airflow.providers.standard.operators.python import PythonOperator
+
+    # BAD: the variable captures a parse-time value, then is passed to the DAG
+    default_args = {"start_date": datetime.now()}
+
+    with DAG(dag_id="bad_defaults", default_args=default_args, 
schedule="@daily") as dag:
+        PythonOperator(task_id="my_task", python_callable=lambda: None)
+
+Even though ``datetime.now()`` is not called directly inside the Dag 
constructor, it flows in through
+``default_args`` and still causes a different serialized Dag on every parse.
+
+**4. Using environment variables or file contents that change between parses:**
+
+.. code-block:: python
+
+    import os
+
+    from airflow.sdk import DAG
+    from airflow.providers.standard.operators.bash import BashOperator
+
+    with DAG(dag_id="bad_env", start_date="2024-01-01", schedule="@daily") as 
dag:
+        # BAD if BUILD_NUMBER changes on every deployment or parse
+        BashOperator(
+            task_id="echo_build",
+            bash_command=f"echo {os.environ.get('BUILD_NUMBER', 'unknown')}",
+        )

Review Comment:
   ```suggestion
           BashOperator(
               task_id="echo_build",
               # BAD if BUILD_NUMBER changes on every deployment or parse
               bash_command=f"echo {os.environ.get('BUILD_NUMBER', 'unknown')}",
   ```



##########
airflow-core/docs/faq.rst:
##########
@@ -237,6 +237,166 @@ There are several reasons why Dags might disappear from 
the UI. Common causes in
 * **Time synchronization issues** - Ensure all nodes (database, schedulers, 
workers) use NTP with <1s clock drift.
 
 
+.. _faq:dag-version-inflation:
+
+Why does my Dag version keep increasing?
+-----------------------------------------
+
+Every time the Dag processor parses a Dag file, it serializes the Dag and 
compares the result with the
+version stored in the metadata database. If anything has changed, Airflow 
creates a new Dag version. This
+mechanism ensures that Dag runs use consistent code throughout their 
execution, even if the Dag file is
+updated mid-run.
+
+**Dag version inflation** occurs when the version number increases 
indefinitely without the Dag author
+making any intentional changes.
+
+What goes wrong
+"""""""""""""""
+
+When Dag versions increase without meaningful changes:
+
+* The metadata database accumulates unnecessary Dag version records, 
increasing storage and query overhead.
+* The UI shows a misleading history of Dag changes, making it harder to 
identify real modifications.
+* The scheduler and API server may consume more memory as they load and cache 
a growing number of Dag versions.
+
+Common causes
+"""""""""""""
+
+Version inflation is caused by using values that change at **parse time** — 
that is, every time the Dag
+processor evaluates the Dag file — as arguments to Dag or Task constructors. 
The most common patterns are:
+
+**1. Using ``datetime.now()`` or ``pendulum.now()`` as ``start_date``:**
+
+.. code-block:: python
+
+    from datetime import datetime
+
+    from airflow.sdk import DAG
+
+    # BAD: datetime.now() produces a different value on every parse
+    with DAG(
+        dag_id="bad_example",
+        start_date=datetime.now(),
+        schedule="@daily",
+    ):
+        ...
+
+Every parse produces a different ``start_date``, so the serialized Dag is 
always different from the
+stored version.
+
+**2. Using random values in Dag or Task arguments:**
+
+.. code-block:: python
+
+    import random
+
+    from airflow.sdk import DAG
+    from airflow.providers.standard.operators.python import PythonOperator
+
+    with DAG(dag_id="bad_random", start_date="2024-01-01", schedule="@daily") 
as dag:
+        # BAD: random value changes every parse
+        PythonOperator(
+            task_id=f"task_{random.randint(1, 1000)}",
+            python_callable=lambda: None,
+        )
+
+**3. Assigning runtime-varying values to variables used in constructors:**
+
+.. code-block:: python
+
+    from datetime import datetime
+
+    from airflow.sdk import DAG
+    from airflow.providers.standard.operators.python import PythonOperator
+
+    # BAD: the variable captures a parse-time value, then is passed to the DAG
+    default_args = {"start_date": datetime.now()}
+
+    with DAG(dag_id="bad_defaults", default_args=default_args, 
schedule="@daily") as dag:
+        PythonOperator(task_id="my_task", python_callable=lambda: None)
+
+Even though ``datetime.now()`` is not called directly inside the Dag 
constructor, it flows in through
+``default_args`` and still causes a different serialized Dag on every parse.
+
+**4. Using environment variables or file contents that change between parses:**
+
+.. code-block:: python
+
+    import os
+
+    from airflow.sdk import DAG
+    from airflow.providers.standard.operators.bash import BashOperator
+
+    with DAG(dag_id="bad_env", start_date="2024-01-01", schedule="@daily") as 
dag:
+        # BAD if BUILD_NUMBER changes on every deployment or parse
+        BashOperator(
+            task_id="echo_build",
+            bash_command=f"echo {os.environ.get('BUILD_NUMBER', 'unknown')}",
+        )
+
+How to avoid version inflation
+""""""""""""""""""""""""""""""
+
+* **Use fixed ``start_date`` values.** Always set ``start_date`` to a static 
``datetime`` literal:
+
+  .. code-block:: python
+
+      import datetime
+
+      from airflow.sdk import DAG
+
+      with DAG(
+          dag_id="good_example",
+          start_date=datetime.datetime(2024, 1, 1),
+          schedule="@daily",
+      ):
+          ...
+
+* **Keep all Dag and Task constructor arguments deterministic.** Arguments 
passed to Dag and Operator
+  constructors must produce the same value on every parse. Move any dynamic 
computation into the
+  ``execute()`` method or use Jinja templates, which are evaluated at task 
execution time rather than
+  parse time.
+
+* **Use Jinja templates for dynamic values:**
+
+  .. code-block:: python
+
+      from airflow.providers.standard.operators.bash import BashOperator
+
+      # GOOD: the template is resolved at execution time, not parse time
+      BashOperator(
+          task_id="echo_date",
+          bash_command="echo {{ ds }}",
+      )
+
+* **Use Airflow Variables with templates instead of top-level lookups:**
+
+  .. code-block:: python
+
+      from airflow.providers.standard.operators.bash import BashOperator
+
+      # GOOD: Variable is resolved at execution time via template
+      BashOperator(
+          task_id="echo_var",
+          bash_command="echo {{ var.value.my_variable }}",
+      )

Review Comment:
   ```suggestion
         BashOperator(
             task_id="echo_date",
             # GOOD: the template is resolved at execution time, not parse time
             bash_command="echo {{ ds }}",
         )
   
   * **Use Airflow Variables with templates instead of top-level lookups:**
   
     .. code-block:: python
   
         from airflow.providers.standard.operators.bash import BashOperator
   
   
         BashOperator(
             task_id="echo_var",
             # GOOD: Variable is resolved at execution time via template
             bash_command="echo {{ var.value.my_variable }}",
         )
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to