ephraimbuddy commented on code in PR #65310:
URL: https://github.com/apache/airflow/pull/65310#discussion_r3225771606


##########
airflow-core/tests/unit/serialization/test_dag_serialization.py:
##########
@@ -3864,6 +3869,83 @@ def test_dag_schema_defaults_optimization():
     assert dag_non_defaults_data["description"] == "Test description"
 
 
+def test_dag_config_driven_fields_always_serialized():
+    """Fields whose schema default was removed are always present on the wire,
+    regardless of whether the value matches the Airflow config default.
+
+    Regression test for: max_active_runs set to the old schema default (16)
+    was silently dropped when cfg max_active_runs_per_dag was also 16.

Review Comment:
   `nit:` docstring describes only one of the three scenarios in the test.
   
   *"max_active_runs set to the old schema default (16) was silently dropped 
when cfg max_active_runs_per_dag was also 16"* — but the **first** block in the 
test sets `cfg = 1`, not 16. The real invariant is "user value equal to schema 
default is not dropped, regardless of cfg value", which the test correctly 
exercises with both `cfg=1` and `cfg=16`. One extra clause covers both:
   
   ```
   Regression test for: max_active_runs (and the other config-driven fields)
   set to the old schema default value was silently dropped during 
serialisation,
   regardless of the airflow.cfg value, because #55849 excluded any field whose
   value matched the schema default.
   ```
   
   ---
   Drafted-by: Claude Code (Opus 4.7); reviewed by @ephraimbuddy before posting



##########
airflow-core/tests/unit/serialization/test_dag_serialization.py:
##########
@@ -3864,6 +3869,83 @@ def test_dag_schema_defaults_optimization():
     assert dag_non_defaults_data["description"] == "Test description"
 
 
+def test_dag_config_driven_fields_always_serialized():
+    """Fields whose schema default was removed are always present on the wire,
+    regardless of whether the value matches the Airflow config default.
+
+    Regression test for: max_active_runs set to the old schema default (16)
+    was silently dropped when cfg max_active_runs_per_dag was also 16.
+    """
+    from airflow.serialization.serialized_objects import LazyDeserializedDAG

Review Comment:
   `nit:` lift this to the module-level imports.
   
   `LazyDeserializedDAG` is used twice in this test and likely useful to other 
future tests in the file. The function-scoped import is only justifiable when 
there's a circular-import or test-isolation reason — neither applies here (the 
module is already heavily imported at the top).
   
   ---
   Drafted-by: Claude Code (Opus 4.7); reviewed by @ephraimbuddy before posting



##########
airflow-core/tests/unit/dag_processing/test_collection.py:
##########
@@ -1137,6 +1137,16 @@ def test_max_active_runs_explicit_value_is_used(self, 
testing_dag_bundle, sessio
         orm_dag = session.get(DagModel, "dag_max_runs")
         assert orm_dag.max_active_runs == 3
 
+    def test_max_active_runs_equal_to_schema_default_not_overridden_by_conf(
+        self, testing_dag_bundle, session, dag_maker
+    ):
+        with conf_vars({("core", "max_active_runs_per_dag"): "1"}):
+            with dag_maker("dag_max_runs_schema_default", schedule=None, 
max_active_runs=16) as dag:
+                ...
+            update_dag_parsing_results_in_db("testing", None, [dag], {}, 0.1, 
set(), session)
+            orm_dag = session.get(DagModel, "dag_max_runs_schema_default")
+            assert orm_dag.max_active_runs == 16

Review Comment:
   `nit:` parametrise (or add sibling tests) for `max_active_tasks` and 
`max_consecutive_failed_dag_runs` too.
   
   The bug applied identically to all three config-driven int fields (each 
one's schema default — `16`, `16`, `0` — could collide with a user-set value). 
The `_explicit_value_is_used` / `_defaults_from_conf_when_none` tests above 
cover all three; this `_equal_to_schema_default_not_overridden_by_conf` 
regression test only covers `max_active_runs`. Either parametrise across the 
three (`field`, `cfg_key`, `schema_default`) or add the two sibling tests, so 
the read-path coverage of the fix is symmetric.
   
   Without this, a future change that re-introduces the bug for, say, 
`max_consecutive_failed_dag_runs=0` would not be caught by this regression test.
   
   ---
   Drafted-by: Claude Code (Opus 4.7); reviewed by @ephraimbuddy before posting



##########
airflow-core/tests/unit/serialization/test_dag_serialization.py:
##########
@@ -3864,6 +3869,83 @@ def test_dag_schema_defaults_optimization():
     assert dag_non_defaults_data["description"] == "Test description"
 
 
+def test_dag_config_driven_fields_always_serialized():

Review Comment:
   `improvement:` consider parametrising — three scenarios in one body is hard 
to extend.
   
   The function has three near-identical blocks: cfg≠user, cfg==user, and the 
catchup override case. Each one builds a DAG, serialises, asserts on the wire, 
then re-asserts via `LazyDeserializedDAG`. A `@pytest.mark.parametrize` over 
`(cfg_overrides, dag_kwargs, expected)` would:
   - make the matrix explicit (right now `cfg=1/user=16` vs `cfg=16/user=16` 
reads as duplication unless you trace it carefully),
   - give each scenario its own test id in pytest output, and
   - make it trivial to add the missing combos (e.g. 
`disable_bundle_versioning=True` with cfg=False — currently not covered at all).
   
   Not blocking, but the test will keep growing as people add fields to the 
"always emit" set, and the current shape will scale poorly.
   
   ---
   Drafted-by: Claude Code (Opus 4.7); reviewed by @ephraimbuddy before posting



##########
airflow-core/tests/unit/serialization/test_dag_serialization.py:
##########
@@ -333,6 +333,11 @@ def _operator_defaults(overrides):
             },
         ],
         "params": [],
+        "catchup": False,
+        "disable_bundle_versioning": False,
+        "max_active_runs": 16,
+        "max_active_tasks": 16,
+        "max_consecutive_failed_dag_runs": 0,

Review Comment:
   `note:` worth a one-line comment here explaining why these five fields are 
unconditionally present.
   
   After this PR these are no longer "schema defaults" but are always emitted 
on the wire because their real defaults come from `airflow.cfg` at parse time. 
A future reader cleaning up the fixture might be tempted to remove them again 
as "redundant defaults"; a comment pointing at the schema.json change (or this 
PR) prevents that regression.
   
   ---
   Drafted-by: Claude Code (Opus 4.7); reviewed by @ephraimbuddy before posting



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to