alvinttang opened a new pull request, #65834: URL: https://github.com/apache/airflow/pull/65834
## Summary Fixes #65696. `DBDagBag._dags` is an unbounded process-lived dict keyed by `dag_version_id`. `SerializedDagModel.write_dag` (introduced in #45524) takes a fast path that does an in-place `UPDATE serialized_dag SET data=…, dag_hash=…` under the same `dag_version_id` whenever the existing version has no associated task instances. After such an update the cached UUID still resolves to the *old* `SerializedDAG`, so the scheduler keeps marking newly added tasks as `removed` and keeps scheduling deleted tasks until the process is restarted. ## Fix `airflow-core/src/airflow/models/dagbag.py`: cache `(SerializedDAG, dag_hash)` tuples instead of bare DAGs. On every cache lookup, do a cheap `SELECT dag_hash FROM serialized_dag WHERE dag_version_id = ?` and compare. Hash match → return cached. Mismatch → pop and fall through to fresh load. Also fixed the post-DB double-checked locking branch the same way. ~35 LOC of production change. ## Test `airflow-core/tests/unit/models/test_dagbag.py::TestDBDagBag::test_get_dag_invalidates_cache_when_dag_hash_changes_in_place` — RED before patch, GREEN after. Updated 3 pre-existing tests for the new tuple cache shape. `pytest tests/unit/models/test_dagbag.py` → 22/22 pass. `ruff check` clean on both files. ## Risk notes - One extra single-column `SELECT dag_hash` per cache hit on a unique-indexed column. Cheaper than the deserialization it preserves on hits and cheaper than the existing full-row load it short-circuits on misses. - Tuple cache value is an internal change. Three tests that introspected `_dags` were updated. Other call sites use the full Mapping API. - The triggerer uses `get_serialized_dag_model()` (separate path, untouched). The API server uses `cache_size` / `cache_ttl` and now also benefits from staleness checks. Refs #65696 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
