anmolxlight opened a new pull request, #68635: URL: https://github.com/apache/airflow/pull/68635
## Summary Store a hash of `dag_version.version_data` to avoid loading and comparing the full JSON manifest on every DAG parse. ### Problem `SerializedDagModel.write_dag`'s "serialized hash unchanged" fast path refreshes `DagVersion.bundle_version` / `version_data` in place, comparing the full stored `version_data` against the incoming value: 1. `_prefetch_dag_write_metadata` loads the **full** `DagVersion` row — including the entire `version_data` JSON — for every DAG in the bulk write. 2. The steady-state same-bundle case re-compares the full `version_data` dict each parse. ### Solution Persist a `version_data_hash` (md5 of canonical JSON, `String(32)`, nullable) on `dag_version` and compare/prefetch that instead of the full blob: - **`DagVersion` model**: new `version_data_hash` column + `compute_version_data_hash()` static method - **`_prefetch_dag_write_metadata`**: uses `load_only()` to skip loading the `version_data` JSON column entirely - **Fast path comparison**: compares `version_data_hash` instead of full dicts - **In-place refresh**: updates `version_data_hash` when bundle metadata changes - **New `DagVersion` rows**: computed on creation ### Verification - All 66 `test_serialized_dag` tests pass - All 8 `test_dag_version` tests pass - All migrations chain correctly from latest `9ff64e1c35d3` Closes: #68567 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
