1fanwang commented on code in PR #66820:
URL: https://github.com/apache/airflow/pull/66820#discussion_r3232722270


##########
airflow-core/src/airflow/jobs/scheduler_job_runner.py:
##########
@@ -1776,6 +1776,15 @@ def _do_scheduling(self, session: Session) -> int:
             self._start_queued_dagruns(session)
             guard.commit()
 
+            # Clear DagRun objects loaded by phase 1 from the identity map so
+            # phase 2 reloads them fresh. Otherwise stale rows can be 
re-dirtied
+            # by flush/merge in _schedule_all_dag_runs and committed in a 
row-lock
+            # order that differs from what other scheduler replicas are taking
+            # for their own work, producing A-B / B-A deadlocks on dag_run and
+            # task_instance under HA scheduler deployments. See
+            # https://github.com/apache/airflow/issues/66817.
+            session.expunge_all()

Review Comment:
   Nice to meet you, Ephraim. Fair concern to flag — wanted to address it 
directly and then send the evidence in a follow-up below.
   
   The pattern you're seeing is real, but not guesses. I run a fairly large 
production Airflow deployment and we're working through the 2 → 3 migration, so 
most of these issues and PRs are operational gaps we've hit or paths we want to 
harden before the cutover. The intent is to land the fixes upstream so the 
community benefits along with us at the cutover.
   
   On the technical analysis itself — some of the raw internal logs and traces 
can't be copy-pasted out due to company policy, but the pattern that works is 
to repro the issue end-to-end against the OSS code, capture before/after 
evidence, and share the result here. That's what we've done on a few other 
threads with you already (the deterministic FAILED → PASSED snippet in this 
PR's regression test is one), and I'll do the same for the specific log shape 
you asked about — sending it in a follow-up comment below.
   
   Hope that helps ease the concern. Thanks for raising it directly — easier to 
address that way than not.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to