1fanwang opened a new pull request, #66819:
URL: https://github.com/apache/airflow/pull/66819

   ### Problem
   
   `DagRun.update_state()` already detects the task-deadlock case — when every 
unfinished task is unrunnable — logs an error, and notifies the state-changed 
listeners with `msg="all_tasks_deadlocked"`. It does not emit a Stats counter, 
so operators who want to alert on deadlock-induced failures end up grepping 
scheduler logs or scraping state-change notifications. There's no first-class 
signal alongside `zombies.zombie_unfinished_run_failure_count` or the 
executor-event failure counters.
   
   ### Fix
   
   Add `stats.incr("dagrun.deadlocked", tags={"dag_id": self.dag_id, 
"run_type": self.run_type})` at the existing log + notify call site in 
`DagRun.update_state`, and register the new counter in the observability 
metrics template.
   
   ### Tests
   
   Added `test_dagrun_deadlock_emits_stats_counter` in 
`airflow-core/tests/unit/models/test_dagrun.py`. Mirrors the existing 
`test_dagrun_deadlock` fixture (invalid `trigger_rule` to force the deadlock 
branch), mocks `stats.incr`, and asserts the call with the expected name and 
tags.
   
   Closes #66818
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to