seanmuth opened a new issue, #68248:
URL: https://github.com/apache/airflow/issues/68248
## What happened
The scheduler crashloops on deployments with historical `task_instance`
records where `dag_version_id IS NULL`. These records exist on any deployment
that was running before the `dag_version` table was introduced (migration
`0047_3_0_0_add_dag_versioning`).
The scheduler fails when it attempts to construct a `DagRunContext` using
one of these historical TIs as `last_ti`:
```
pydantic_core._pydantic_core.ValidationError: 1 validation error for
DagRunContext
last_ti.dag_version_id
UUID input should be a string, bytes or UUID object [type=uuid_type,
input_value=None, input_type=NoneType]
For further information visit
https://errors.pydantic.dev/2.13/v/uuid_type
```
## Airflow Version
3.1.x (Astro Runtime 3.1-15)
## Steps to Reproduce
1. Have a deployment with historical TI records predating `dag_version`
(i.e. `task_instance.dag_version_id IS NULL`)
2. Upgrade to Airflow 3.1.x
3. Scheduler begins processing a DAG run whose `last_ti` is one of these
historical records
4. Scheduler crashloops
## Expected Behavior
The scheduler should not crash when encountering a historical TI with
`dag_version_id=None`, nor should it silently skip or ignore the associated DAG
run. A reasonable fallback would be to substitute the most recent
`dag_version_id` for the given `dag_id` when constructing `DagRunContext` —
keeping the run in-flight while avoiding the validation error. Open to other
approaches from the community.
## Actual Behavior
Scheduler crashloops continuously. The only workaround is to backfill all
historical TIs with a valid `dag_version_id`:
```sql
-- Run in batches due to volume (can be 100M+ rows on long-running
deployments)
WITH latest_version AS (
SELECT DISTINCT ON (dag_id) id, dag_id
FROM dag_version
ORDER BY dag_id, version_number DESC
)
UPDATE task_instance ti
SET dag_version_id = lv.id
FROM latest_version lv
WHERE ti.dag_id = lv.dag_id
AND ti.dag_version_id IS NULL;
```
## Additional Context
- `dag_version_id` FK constraint was changed from `ON DELETE CASCADE` to `ON
DELETE RESTRICT` in migration `0072_3_1_0` — tightening the relationship
between TIs and dag_version rows makes this null scenario more impactful
- On large deployments this backfill can affect 100M+ rows; a partial index
on `(dag_id) WHERE dag_version_id IS NULL` is recommended before running
- Related: #66177 (FK deadlock on `db clean` with dag_version)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]