steveahnahn opened a new pull request, #62687:
URL: https://github.com/apache/airflow/pull/62687

   ## Summary
   Fix scheduler DagRun creation transaction poisoning after DB errors.
   When `_create_dag_runs` processes multiple DAGs in one scheduling loop, a DB 
error during one `create_dagrun()` call can invalidate the SQLAlchemy 
transaction state for the shared session. That can cause unrelated DAGs later 
in the same loop to fail due to pending rollback state instead of their own 
logic.
   
   Changes: 
   - Isolates each scheduled DagRun creation attempt with 
`session.begin_nested()` (savepoint), so a failure in one Dag is rolled back 
locally and does not poison the rest of the loop.
   - Captures `dag_id` early and uses that value in exception logging to avoid 
additional ORM/session access after a transaction failure.
   
   ### Test coverage
   
   Added `test_create_dag_runs_recovers_after_db_error` regression test
   The test injects a real DB flush error for the first DAG creation attempt 
and verifies:
   
   1. The scheduler logs the failure.
   2. The first DAG run is not created.
   3. A second DAG in the same `_create_dag_runs` call is still created 
successfully.
   
   related: #59120
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes (OpenAI Codex)
   
   Generated-by: OpenAI Codex following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to