uplsh580 commented on issue #65818:
URL: https://github.com/apache/airflow/issues/65818#issuecomment-4525328884

   We are seeing the same class of deadlock on Airflow 3.1.8 / MySQL with 2 
triggerer replicas.
   
   In our case the deadlock happens on a single-row UPDATE through the 
`Trigger.submit_event` → `handle_event_submit` path (not the bulk UPDATE paths 
in `SchedulerJobRunner.check_trigger_timeouts` / `Trigger.clean_unused` that 
#65920 / #65836 target). The exception is not retried anywhere in the call 
chain (`triggerer_job_runner.handle_events` → `Trigger.submit_event` → 
`handle_event_submit`), so it propagates up to `TriggerRunnerSupervisor.run` 
and the triggerer process exits, after which Kubernetes restarts the pod. No 
task failure — deferred tasks are picked up by the other triggerer — but the 
restart shows up as an alert.
   
   Environment
   - Airflow 3.1.8
   - MySQL (InnoDB)
   - triggerer replicas: 2
   - Workload: deferrable operators (Spark application completion event in this 
case)
   
   Failing statement
   UPDATE task_instance
   SET state='scheduled', scheduled_dttm=..., updated_at=..., trigger_id=NULL, 
next_kwargs=...
   WHERE task_instance.id = '019e511c-c507-74d8-ace0-f4c396cc8ef1'
   → `MySQLdb.OperationalError: (1213, 'Deadlock found when trying to get lock; 
try restarting transaction')`
   
   Stack
   airflow/jobs/triggerer_job_runner.py:534 run
   airflow/jobs/triggerer_job_runner.py:561 handle_events
   airflow/models/trigger.py:252 submit_event
   airflow/models/trigger.py:422 handle_event_submit
     → session.flush()
   
   Looking at `airflow-core/src/airflow/models/trigger.py` on the 3.1.8 tag, 
neither `submit_event` (L239) nor `handle_event_submit` (L394 / L426) carries 
`@retry_db_transaction` or any try/except, while several deadlock-sensitive 
paths on the scheduler side already do. Wrapping these entry points (or the 
per-event loop in `triggerer_job_runner.handle_events`) with 
`@retry_db_transaction` would also cover this single-row UPDATE case that the 
existing PRs do not address.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to