arbazmirza-lyft commented on issue #68263:
URL: https://github.com/apache/airflow/issues/68263#issuecomment-4655840829

   Claude analysis in case it helps with repro.
   
   ```
   The chain:
     1. DAG  appends dag_tag at parse time
     2. During bulk_write_to_db, Airflow 3's collection.py does a plain INSERT 
INTO dag_tag
     without ON CONFLICT DO NOTHING
     3. The row (name='is_gdpr=False', dag_id='featuregen_dag_generator') 
already exists in the
     DB from a prior parse
     4. PostgreSQL throws UniqueViolation → SQLAlchemy session is poisoned → 
PendingRollbackError
     on the next query → the entire DAG processor job crashes
   
     This is an Airflow 3 bug — bulk_write_to_db should be using an upsert for 
tag rows but
     doesn't. The symptom manifests specifically when the processor tries to 
re-sync a DAG whose
     tags are already committed to the DB.
   
     Short-term workaround: Delete the stale duplicate tag row from the 
metadata DB:
     DELETE FROM dag_tag WHERE dag_id = 'featuregen_dag_generator' AND name = 
'is_gdpr=False';
     This will unblock the processor, though it may recur if the Airflow bug 
triggers again.
   
     Proper fix: Patch the Airflow 3 tag insertion to use INSERT ... ON 
CONFLICT DO NOTHING (or
     merge/upsert logic) in airflow/dag_processing/collection.py or
     airflow/serialization/serialized_objects.py. Worth checking if a newer 
Airflow 3 patch
     release has fixed this upstream, or filing/finding the upstream issue.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to