anshuksi282-ksolves commented on code in PR #56476:
URL: https://github.com/apache/airflow/pull/56476#discussion_r2416696084


##########
airflow-core/src/airflow/models/serialized_dag.py:
##########
@@ -410,7 +429,12 @@ def write_dag(
                 return False
 
         log.debug("Checking if DAG (%s) changed", dag.dag_id)
+
+        serialized_json = cls.serialize_dag(dag)
         new_serialized_dag = cls(dag)
+        new_serialized_dag._data = serialized_json
+        new_serialized_dag.dag_hash = 
hashlib.sha1(serialized_json.encode()).hexdigest()

Review Comment:
   Hi @ashb and @ephraimbuddy,
   
   Thank you both for your crucial and detailed feedback. I have implemented 
all necessary changes and confirmed the fix with unit tests.
   
   ### Code Fixes Implemented
   
   1.  **Deterministic Hash Logic:**
       * I have integrated `_sort_serialized_dag_dict` to recursively sort the 
DAG data by keys/task_IDs. This ensures the `dag_hash` is **100% stable**, 
preventing unnecessary versioning.
   2.  **Data Persistence Integrity:**
       * I have corrected the logic in `SerializedDagModel.write_dag`. We now 
use the sorted data *only* for hash calculation, but the **original (unsorted) 
DAG data is always persisted** to the database, eliminating the risk of side 
effects.
   
   ### Verification and Proof (Unit Test Output)
   
   The logic has been verified by successfully running the unit tests, 
confirming that the hash is stable and database writes are correctly skipped 
when only internal JSON ordering changes.
   
   ```bash
   root@172d77cdb349:/opt/airflow# pytest 
./airflow-core/tests/unit/dag_processing/test_serialized_dag_model.py -v
   ======================== test session starts =========================
   ...
   collected 6 items
   
   
airflow-core/tests/unit/dag_processing/test_serialized_dag_model.py::test_hash_deterministic
 PASSED [ 16%]
   
airflow-core/tests/unit/dag_processing/test_serialized_dag_model.py::test_write_dag_creates_new_version
 PASSED [ 33%]
   
airflow-core/tests/unit/dag_processing/test_serialized_dag_model.py::test_write_dag_skips_if_unchanged
 PASSED [ 50%]
   
airflow-core/tests/unit/dag_processing/test_serialized_dag_model.py::test_dynamic_dag_update_without_task_instances
 PASSED [ 66%]
   
airflow-core/tests/unit/dag_processing/test_serialized_dag_model.py::test_compressed_serialization
 PASSED [ 83%]
   
airflow-core/tests/unit/dag_processing/test_serialized_dag_model.py::test_actual_serialization
 PASSED [100%]
   =================== 6 passed, 2 warnings in 4.81s ====================



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to