anshuksi282-ksolves commented on code in PR #56476:
URL: https://github.com/apache/airflow/pull/56476#discussion_r2416696084
##########
airflow-core/src/airflow/models/serialized_dag.py:
##########
@@ -410,7 +429,12 @@ def write_dag(
return False
log.debug("Checking if DAG (%s) changed", dag.dag_id)
+
+ serialized_json = cls.serialize_dag(dag)
new_serialized_dag = cls(dag)
+ new_serialized_dag._data = serialized_json
+ new_serialized_dag.dag_hash =
hashlib.sha1(serialized_json.encode()).hexdigest()
Review Comment:
Hi @ashb and @ephraimbuddy,
Thank you both for your crucial and detailed feedback. I have implemented
all necessary changes and confirmed the fix with unit tests.
### Code Fixes Implemented
1. **Deterministic Hash Logic:**
* I have integrated `_sort_serialized_dag_dict` to recursively sort the
DAG data by keys/task_IDs. This ensures the `dag_hash` is **100% stable**,
preventing unnecessary versioning.
2. **Data Persistence Integrity:**
* I have corrected the logic in `SerializedDagModel.write_dag`. We now
use the sorted data *only* for hash calculation, but the **original (unsorted)
DAG data is always persisted** to the database, eliminating the risk of side
effects.
### Verification and Proof (Unit Test Output)
The logic has been verified by successfully running the unit tests,
confirming that the hash is stable and database writes are correctly skipped
when only internal JSON ordering changes.
```bash
root@172d77cdb349:/opt/airflow# pytest
./airflow-core/tests/unit/dag_processing/test_serialized_dag_model.py -v
======================== test session starts =========================
...
collected 6 items
airflow-core/tests/unit/dag_processing/test_serialized_dag_model.py::test_hash_deterministic
PASSED [ 16%]
airflow-core/tests/unit/dag_processing/test_serialized_dag_model.py::test_write_dag_creates_new_version
PASSED [ 33%]
airflow-core/tests/unit/dag_processing/test_serialized_dag_model.py::test_write_dag_skips_if_unchanged
PASSED [ 50%]
airflow-core/tests/unit/dag_processing/test_serialized_dag_model.py::test_dynamic_dag_update_without_task_instances
PASSED [ 66%]
airflow-core/tests/unit/dag_processing/test_serialized_dag_model.py::test_compressed_serialization
PASSED [ 83%]
airflow-core/tests/unit/dag_processing/test_serialized_dag_model.py::test_actual_serialization
PASSED [100%]
=================== 6 passed, 2 warnings in 4.81s ====================
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]