potiuk commented on code in PR #66820:
URL: https://github.com/apache/airflow/pull/66820#discussion_r3257956577
##########
airflow-core/src/airflow/jobs/scheduler_job_runner.py:
##########
@@ -1776,6 +1776,15 @@ def _do_scheduling(self, session: Session) -> int:
self._start_queued_dagruns(session)
guard.commit()
+ # Clear DagRun objects loaded by phase 1 from the identity map so
+ # phase 2 reloads them fresh. Otherwise stale rows can be
re-dirtied
+ # by flush/merge in _schedule_all_dag_runs and committed in a
row-lock
+ # order that differs from what other scheduler replicas are taking
+ # for their own work, producing A-B / B-A deadlocks on dag_run and
+ # task_instance under HA scheduler deployments. See
+ # https://github.com/apache/airflow/issues/66817.
+ session.expunge_all()
Review Comment:
Follow-up: this rule is now written down in `AGENTS.md` so agents (and
humans) can be pointed at it directly — PR up at #67100.
---
Drafted-by: Claude Opus 4.7 (1M context); reviewed by @potiuk before posting
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]