GitHub user karenbraganz closed a discussion: Report of increased incidence of tasks getting stuck in queued in Airflow 2.10.5
In issue #51301 @KarthikeyanDevendhiran reported an increased incidence of stuck in queued tasks failing when they upgraded to Airflow 2.10.5. I have not seen any other reports of this so far. Additionally, this is unexpected since 2.10.5 includes [PR #43520](https://github.com/apache/airflow/pull/43520), which allows tasks stuck in queued to be requeued and should reduce the incidence of such failures. I had asked them to provide scheduler logs and data from the log table for failed task insatnces so i could look into this. These are the scheduler logs provided: ``` `kubectl logs airflow-scheduler-65cfcf89c9-w8xjc -n airflow | grep "TaskInstanceKey(dag_id='dag_name', task_id='task_name', run_id='scheduled__2025-06-09T08:00:00+00:00'" Defaulted container "scheduler" out of: scheduler, git-sync, scheduler-log-groomer, wait-for-airflow-migrations (init), git-sync-init (init) [2025-06-09T10:00:00.476+0000] {scheduler_job_runner.py:692} INFO - Sending TaskInstanceKey(dag_id='dag_name', task_id='task_name', run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1) to KubernetesExecutor with priority 1 and queue default [2025-06-09T10:00:00.485+0000] {kubernetes_executor.py:352} INFO - Add task TaskInstanceKey(dag_id='dag_name', task_id='task_name', run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1) with command ['airflow', 'tasks', 'run', 'dag_name', 'task_name', 'scheduled__2025-06-09T08:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/folder_name.py'] [2025-06-09T10:00:05.201+0000] {scheduler_job_runner.py:776} INFO - Received executor event with state queued for task instance TaskInstanceKey(dag_id='dag_name', task_id='task_name', run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1) [2025-06-09T10:00:12.165+0000] {kubernetes_executor_utils.py:425} INFO - Creating kubernetes pod for job is TaskInstanceKey(dag_id='dag_name', task_id='task_name', run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1), with pod name dag_name-task_name-8vpdausn, annotations: <omitted> [2025-06-09T10:15:48.070+0000] {scheduler_job_runner.py:692} INFO - Sending TaskInstanceKey(dag_id='dag_name', task_id='task_name', run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1) to KubernetesExecutor with priority 1 and queue default [2025-06-09T10:15:48.075+0000] {kubernetes_executor.py:352} INFO - Add task TaskInstanceKey(dag_id='dag_name', task_id='task_name', run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1) with command ['airflow', 'tasks', 'run', 'dag_name', 'task_name', 'scheduled__2025-06-09T08:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/folder_name.py'] [2025-06-09T10:15:48.714+0000] {scheduler_job_runner.py:776} INFO - Received executor event with state queued for task instance TaskInstanceKey(dag_id='dag_name', task_id='task_name', run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1) [2025-06-09T10:15:49.006+0000] {kubernetes_executor_utils.py:425} INFO - Creating kubernetes pod for job is TaskInstanceKey(dag_id='dag_name', task_id='task_name', run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1), with pod name dag_name-task_name-rkrqu4hr, annotations: <omitted> [2025-06-09T10:31:50.179+0000] {scheduler_job_runner.py:692} INFO - Sending TaskInstanceKey(dag_id='dag_name', task_id='task_name', run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1) to KubernetesExecutor with priority 1 and queue default [2025-06-09T10:31:50.184+0000] {kubernetes_executor.py:352} INFO - Add task TaskInstanceKey(dag_id='dag_name', task_id='task_name', run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1) with command ['airflow', 'tasks', 'run', 'dag_name', 'task_name', 'scheduled__2025-06-09T08:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/folder_name.py'] [2025-06-09T10:31:50.255+0000] {kubernetes_executor_utils.py:425} INFO - Creating kubernetes pod for job is TaskInstanceKey(dag_id='dag_name', task_id='task_name', run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1), with pod name dag_name-task_name-rx89exlt, annotations: <omitted> [2025-06-09T10:31:51.614+0000] {scheduler_job_runner.py:776} INFO - Received executor event with state queued for task instance TaskInstanceKey(dag_id='dag_name', task_id='task_name', run_id='scheduled__2025-06-09T08:00:00+00:00', try_number=1, map_index=-1)` ``` These are the data from the log table for two separate task instances: ``` TASK 1 `airflow=> SELECT * FROM log WHERE dag_id = 'dag_name' AND task_id = 'task_name' AND run_id = 'scheduled__2025-06-09T08:00:00+00:00' LIMIT 10; id | dttm | dag_id | task_id | map_index | event | execution_date | owner | extra | owner_display_name | run_id | try_number ---------+-------------------------------+--------------------+-------------------------+-----------+------------------------- -------+----------------+-------+--------------------------------------------------------------------------------------------- ------+--------------------+--------------------------------------+------------ 1802783 | 2025-06-09 10:15:47.409503+00 | dag_name | task_name | -1 | stuck in queued reschedu le | | | Task was in queued state for longer than 900.0 seconds; task state will be set back to sched uled. | | scheduled__2025-06-09T08:00:00+00:00 | 1 1802824 | 2025-06-09 10:31:49.016565+00 | dag_name | task_name | -1 | stuck in queued reschedu le | | | Task was in queued state for longer than 900.0 seconds; task state will be set back to sched uled. | | scheduled__2025-06-09T08:00:00+00:00 | 1 1802844 | 2025-06-09 10:47:50.486896+00 | dag_name | task_name | -1 | stuck in queued tries ex ceeded | | | Task was requeued more than 2 times and will be failed. | | scheduled__2025-06-09T08:00:00+00:00 | 1` TASK 2 `id | dttm | dag_id | task_id | map_index | event | execution_date | owner | extra | owner_display_name | run_id | try_number ---------+-------------------------------+----------------------+-------------------+-----------+----------------------------- ---+----------------+-------+------------------------------------------------------------------------------------------------- --+--------------------+--------------------------------------+------------ 1699360 | 2025-05-02 14:48:53.464319+00 | dag_name | task_name | -1 | stuck in queued reschedule | | | Task was in queued state for longer than 900.0 seconds; task state will be set back to scheduled . | | scheduled__2025-04-30T14:00:00+00:00 | 1 1699381 | 2025-05-02 15:04:55.373482+00 | dag_name | task_name | -1 | stuck in queued reschedule | | | Task was in queued state for longer than 900.0 seconds; task state will be set back to scheduled . | | scheduled__2025-04-30T14:00:00+00:00 | 1 1699400 | 2025-05-02 15:21:02.064989+00 | dag_name | task_name | -1 | stuck in queued tries exceed ed | | | Task was requeued more than 2 times and will be failed. | | scheduled__2025-04-30T14:00:00 ` ``` GitHub link: https://github.com/apache/airflow/discussions/51597 ---- This is an automatically sent email for commits@airflow.apache.org. To unsubscribe, please send an email to: commits-unsubscr...@airflow.apache.org