1fanwang opened a new issue, #66794:
URL: https://github.com/apache/airflow/issues/66794

   ### Apache Airflow version
   
   main (3.x)
   
   ### What happened?
   
   In `SchedulerJobRunner.process_executor_events` (main, ~line 1240+), the 
QUEUED, FAILED, SUCCESS, RUNNING, RESTARTING events from the executor event 
buffer are all funneled through one `with_row_locks(..., skip_locked=True)` 
batch. The lock window includes all events, even the QUEUED ones whose only 
side-effect is writing `external_executor_id` on a TI row.
   
   `external_executor_id` is the pod / process identifier assigned by the 
executor that dispatched the TI. Only the dispatching scheduler ever produces a 
QUEUED event for a given (TI, try_number), so there is no multi-scheduler race 
on this write — the lock is unnecessary. Under heavy load on MySQL, holding the 
row lock concurrently with workers' state updates triggers `1213 Deadlock found 
when trying to get lock`, which propagates out of the heartbeat and crashes the 
scheduler loop.
   
   ### What you think should happen instead?
   
   Split QUEUED events out of the locked batch:
   
   1. Phase 1 — QUEUED events: direct `UPDATE task_instance SET 
external_executor_id = :pod_name WHERE id = :ti_id` per event. No FOR UPDATE. 
Safe to write at any state because the field is metadata only (executor never 
reads it back to dispatch).
   2. Phase 2 — FAILED/SUCCESS (and optionally RUNNING/RESTARTING) events: keep 
the FOR UPDATE SKIP LOCKED batch.
   
   This reduces the lock window proportionally to the QUEUED-event share of the 
batch, eliminates the multi-scheduler deadlock surface for those events, and 
keeps semantics for terminal events unchanged.
   
   ### How to reproduce
   
   CeleryExecutor or KubernetesExecutor, MySQL 8 metadata DB, parallelism / 
max_active_tasks_per_dag set so that the scheduler regularly sees QUEUED + 
terminal events in the same heartbeat. Under sustained 100+ TI/s churn, the 
scheduler emits `1213 Deadlock found` roughly once per minute and exits.
   
   ### Anything else?
   
   The fix is small (split + per-key UPDATE) but it changes a hot path, so a 
benchmark comparing scheduler-loop p99 before/after would help reviewers feel 
confident.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's Code of Conduct


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to