1fanwang commented on code in PR #67043:
URL: https://github.com/apache/airflow/pull/67043#discussion_r3359099494
##########
airflow-core/tests/unit/jobs/test_scheduler_job.py:
##########
@@ -914,6 +914,69 @@ def
test_process_executor_events_multiple_try_numbers_warns(
mock_task_callback.assert_not_called()
mock_stats.incr.assert_not_called()
+ def test_process_executor_events_queued_updates_without_row_lock(self,
dag_maker, session):
+ dag_id = "test_process_executor_events_queued_updates_without_row_lock"
+ executor_id = "queued_executor_id"
+
+ with dag_maker(dag_id=dag_id, fileloc="/test_path1/"):
+ task = EmptyOperator(task_id="dummy_task")
+ ti = dag_maker.create_dagrun().get_task_instance(task.task_id)
+ ti.state = State.QUEUED
+ session.merge(ti)
+ session.commit()
+
+ executor = MockExecutor(do_update=False)
+ scheduler_job = Job()
+ self.job_runner = SchedulerJobRunner(scheduler_job,
executors=[executor])
+
+ executor.event_buffer[ti.key] = State.QUEUED, executor_id
+
+ with mock.patch(
+ "airflow.jobs.scheduler_job_runner.with_row_locks",
wraps=with_row_locks
+ ) as row_locks:
+ self.job_runner._process_executor_events(executor=executor,
session=session)
+
+ ti.refresh_from_db(session=session)
+ assert ti.external_executor_id == executor_id
+ assert ti.key not in executor.event_buffer
+ row_locks.assert_not_called()
Review Comment:
Following up on the before/after I asked for — I ran it, and it argues
*against* attaching one. Two threads on a `task_instance`-shaped table; the
heartbeat updates the same rows in the opposite order:
```
BEFORE (SELECT … FOR UPDATE)
run 1-5: sched=[1213 Deadlock found] hb=[ok] 5/5 deadlock
AFTER (direct UPDATE, this PR)
run 1: sched=[1213…] hb=[ok]
run 2: sched=[ok] hb=[1213…]
run 3: sched=[ok] hb=[1213…]
run 4: sched=[1213…] hb=[ok]
run 5: sched=[ok] hb=[1213…] 5/5 deadlock
```
InnoDB takes the same exclusive row locks for a plain `UPDATE` as for
`SELECT … FOR UPDATE`, so two transactions crossing on the rows still cycle
(AFTER just alternates the victim). The safety here isn't the lock type — it's
that only the dispatching scheduler writes `external_executor_id` for a given
`(TI, try_number)`, so nothing crosses it in practice. That single-writer
invariant is the load-bearing piece worth confirming (no heartbeat/adoption
path writing `external_executor_id` on a queued row), more than a deadlock
trace.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]