GitHub user karenbraganz closed a discussion: Report of increased incidence of 
tasks getting stuck in queued in Airflow 2.10.5

In issue #51301 @KarthikeyanDevendhiran reported an increased incidence of 
stuck in queued tasks failing when they upgraded to Airflow 2.10.5. I have not 
seen any other reports of this so far. Additionally, this is unexpected since 
2.10.5 includes [PR #43520](https://github.com/apache/airflow/pull/43520), 
which allows tasks stuck in queued to be requeued and should reduce the 
incidence of such failures. I had asked them to provide scheduler logs and data 
from the log table for failed task insatnces so i could look into this. 

These are the scheduler logs provided:
```
`kubectl logs airflow-scheduler-65cfcf89c9-w8xjc -n airflow | grep 
"TaskInstanceKey(dag_id='dag_name', task_id='task_name', 
run_id='scheduled__2025-06-09T08:00:00+00:00'"
Defaulted container "scheduler" out of: scheduler, git-sync, 
scheduler-log-groomer, wait-for-airflow-migrations (init), git-sync-init (init)
[2025-06-09T10:00:00.476+0000] {scheduler_job_runner.py:692} INFO - Sending 
TaskInstanceKey(dag_id='dag_name', task_id='task_name', 
run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1) to 
KubernetesExecutor with priority 1 and queue default
[2025-06-09T10:00:00.485+0000] {kubernetes_executor.py:352} INFO - Add task 
TaskInstanceKey(dag_id='dag_name', task_id='task_name', 
run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1) with 
command ['airflow', 'tasks', 'run', 'dag_name', 'task_name', 
'scheduled__2025-06-09T08:00:00+00:00', '--local', '--subdir', 
'DAGS_FOLDER/folder_name.py']
[2025-06-09T10:00:05.201+0000] {scheduler_job_runner.py:776} INFO - Received 
executor event with state queued for task instance 
TaskInstanceKey(dag_id='dag_name', task_id='task_name', 
run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1)
[2025-06-09T10:00:12.165+0000] {kubernetes_executor_utils.py:425} INFO - 
Creating kubernetes pod for job is TaskInstanceKey(dag_id='dag_name', 
task_id='task_name', run_id='scheduled__2025-06-09T08:00:00+00:00', 
try_number=1, map_index=-1), with pod name dag_name-task_name-8vpdausn, 
annotations: <omitted>
[2025-06-09T10:15:48.070+0000] {scheduler_job_runner.py:692} INFO - Sending 
TaskInstanceKey(dag_id='dag_name', task_id='task_name', 
run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1) to 
KubernetesExecutor with priority 1 and queue default
[2025-06-09T10:15:48.075+0000] {kubernetes_executor.py:352} INFO - Add task 
TaskInstanceKey(dag_id='dag_name', task_id='task_name', 
run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1) with 
command ['airflow', 'tasks', 'run', 'dag_name', 'task_name', 
'scheduled__2025-06-09T08:00:00+00:00', '--local', '--subdir', 
'DAGS_FOLDER/folder_name.py']
[2025-06-09T10:15:48.714+0000] {scheduler_job_runner.py:776} INFO - Received 
executor event with state queued for task instance 
TaskInstanceKey(dag_id='dag_name', task_id='task_name', 
run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1)
[2025-06-09T10:15:49.006+0000] {kubernetes_executor_utils.py:425} INFO - 
Creating kubernetes pod for job is TaskInstanceKey(dag_id='dag_name', 
task_id='task_name', run_id='scheduled__2025-06-09T08:00:00+00:00', 
try_number=1, map_index=-1), with pod name dag_name-task_name-rkrqu4hr, 
annotations: <omitted>
[2025-06-09T10:31:50.179+0000] {scheduler_job_runner.py:692} INFO - Sending 
TaskInstanceKey(dag_id='dag_name', task_id='task_name', 
run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1) to 
KubernetesExecutor with priority 1 and queue default
[2025-06-09T10:31:50.184+0000] {kubernetes_executor.py:352} INFO - Add task 
TaskInstanceKey(dag_id='dag_name', task_id='task_name', 
run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1) with 
command ['airflow', 'tasks', 'run', 'dag_name', 'task_name', 
'scheduled__2025-06-09T08:00:00+00:00', '--local', '--subdir', 
'DAGS_FOLDER/folder_name.py']
[2025-06-09T10:31:50.255+0000] {kubernetes_executor_utils.py:425} INFO - 
Creating kubernetes pod for job is TaskInstanceKey(dag_id='dag_name', 
task_id='task_name', run_id='scheduled__2025-06-09T08:00:00+00:00', 
try_number=1, map_index=-1), with pod name dag_name-task_name-rx89exlt, 
annotations: <omitted>
[2025-06-09T10:31:51.614+0000] {scheduler_job_runner.py:776} INFO - Received 
executor event with state queued for task instance 
TaskInstanceKey(dag_id='dag_name', task_id='task_name', 
run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1)`
```

These are the data from the log table for two separate task instances:
```
TASK 1

`airflow=> SELECT * FROM log WHERE dag_id = 'dag_name' AND task_id = 
'task_name' AND run_id = 'scheduled__2025-06-09T08:00:00+00:00' LIMIT 10;
   id    |             dttm              |       dag_id       |         task_id 
        | map_index |             event
       | execution_date | owner |                                               
extra
      | owner_display_name |                run_id                | try_number
---------+-------------------------------+--------------------+-------------------------+-----------+-------------------------
-------+----------------+-------+---------------------------------------------------------------------------------------------
------+--------------------+--------------------------------------+------------
1802783 | 2025-06-09 10:15:47.409503+00 | dag_name | task_name |        -1 | 
stuck in queued reschedu
le     |                |       | Task was in queued state for longer than 
900.0 seconds; task state will be set back to sched
uled. |                    | scheduled__2025-06-09T08:00:00+00:00 |          1
1802824 | 2025-06-09 10:31:49.016565+00 | dag_name | task_name |        -1 | 
stuck in queued reschedu
le     |                |       | Task was in queued state for longer than 
900.0 seconds; task state will be set back to sched
uled. |                    | scheduled__2025-06-09T08:00:00+00:00 |          1
1802844 | 2025-06-09 10:47:50.486896+00 | dag_name | task_name |        -1 | 
stuck in queued tries ex
ceeded |                |       | Task was requeued more than 2 times and will 
be failed.
      |                    | scheduled__2025-06-09T08:00:00+00:00 |          1`


TASK 2

`id    |             dttm              |        dag_id        |      task_id    
  | map_index |             event
   | execution_date | owner |                                               
extra
  | owner_display_name |                run_id                | try_number
---------+-------------------------------+----------------------+-------------------+-----------+-----------------------------
---+----------------+-------+-------------------------------------------------------------------------------------------------
--+--------------------+--------------------------------------+------------
1699360 | 2025-05-02 14:48:53.464319+00 | dag_name | task_name |         -1 | 
stuck in queued reschedule
   |                |       | Task was in queued state for longer than 900.0 
seconds; task state will be set back to scheduled
. |                    | scheduled__2025-04-30T14:00:00+00:00 |          1
1699381 | 2025-05-02 15:04:55.373482+00 | dag_name | task_name |         -1 | 
stuck in queued reschedule
   |                |       | Task was in queued state for longer than 900.0 
seconds; task state will be set back to scheduled
. |                    | scheduled__2025-04-30T14:00:00+00:00 |          1
1699400 | 2025-05-02 15:21:02.064989+00 | dag_name | task_name |         -1 | 
stuck in queued tries exceed
ed |                |       | Task was requeued more than 2 times and will be 
failed.
  |                    | scheduled__2025-04-30T14:00:00
`

```

GitHub link: https://github.com/apache/airflow/discussions/51597

----
This is an automatically sent email for commits@airflow.apache.org.
To unsubscribe, please send an email to: commits-unsubscr...@airflow.apache.org

Reply via email to