anshuksi282-ksolves opened a new pull request, #56476:
URL: https://github.com/apache/airflow/pull/56476

   # Fix deterministic DAG serialization in Airflow 3.1.0
   
   **Issue:** #56471
   
   ## Context and Problem
   In Airflow 3.1.0, serialized DAGs were being stored in the `serialized_dag` 
table on nearly every parsing cycle, even when the underlying DAG file had no 
functional changes.  
   The root cause was non-deterministic serialization: dictionary keys and list 
elements in the JSON column were not consistently ordered between parses.  
   This caused unnecessary new DAG versions and made version tracking unstable.
   
   ## Impact
   Every DAG parse, even for unchanged DAGs, generated a new version in the 
database.  
   This increased DB writes, made DAG version history confusing, and could 
affect webserver and scheduler performance.
   
   ## Solution
   - Added a `_sort_serialized_dag_dict` method in `SerializedDagModel` that 
recursively sorts dictionaries and lists in the serialized DAG JSON.  
   - Updated `hash` and `serialize_dag` methods to use this deterministic 
ordering before computing DAG hashes.  
   - This ensures logically identical DAGs produce the same serialized JSON and 
hash, preventing unnecessary new versions.  
   - Tested by creating a sample DAG (`test_serialized_dag`) and verifying that 
multiple parses without DAG changes do not create new database entries.  
   - Verified output in Python DB shell using 
`SerializedDagModel.get_latest_serialized_dags`, showing consistent `dag_hash` 
and `data`.
   
   ## Result
   Serialized DAGs are now deterministic. Only actual DAG changes trigger new 
versions, reducing database writes and improving system reliability.
   
   <!-- Please keep an empty line above the dashes. -->
   ---
   **^ Add meaningful description above**
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a 
newsfragment file, named `{pr_number}.significant.rst` or 
`{issue_number}.significant.rst`, in 
[airflow-core/newsfragments](https://github.com/apache/airflow/tree/main/airflow-core/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to