arbazmirza-lyft commented on issue #68263:
URL: https://github.com/apache/airflow/issues/68263#issuecomment-4655840829
Claude analysis in case it helps with repro.
```
The chain:
1. DAG appends dag_tag at parse time
2. During bulk_write_to_db, Airflow 3's collection.py does a plain INSERT
INTO dag_tag
without ON CONFLICT DO NOTHING
3. The row (name='is_gdpr=False', dag_id='featuregen_dag_generator')
already exists in the
DB from a prior parse
4. PostgreSQL throws UniqueViolation → SQLAlchemy session is poisoned →
PendingRollbackError
on the next query → the entire DAG processor job crashes
This is an Airflow 3 bug — bulk_write_to_db should be using an upsert for
tag rows but
doesn't. The symptom manifests specifically when the processor tries to
re-sync a DAG whose
tags are already committed to the DB.
Short-term workaround: Delete the stale duplicate tag row from the
metadata DB:
DELETE FROM dag_tag WHERE dag_id = 'featuregen_dag_generator' AND name =
'is_gdpr=False';
This will unblock the processor, though it may recur if the Airflow bug
triggers again.
Proper fix: Patch the Airflow 3 tag insertion to use INSERT ... ON
CONFLICT DO NOTHING (or
merge/upsert logic) in airflow/dag_processing/collection.py or
airflow/serialization/serialized_objects.py. Worth checking if a newer
Airflow 3 patch
release has fixed this upstream, or filing/finding the upstream issue.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]