GitHub user mandar14 created a discussion: Airflow[v2.11.0] Celery worker CPU 
starvation

We have Airflow v2.11.0 setup on on-prem linux(RHEL).
VM-1:  Webserver, Scheduler, RabbitMQ, PostgreSQL
VM-2,3:  Celary workers. (16vCPUs each)

Note: Previously we were using v2.2.5 and setup new separate cluster with above 
configuration.

On new setup(v2.11.0) we have been seeing CPU starvation on worker nodes.
Newly introduced task supervisor is holding CPU thread for long time while task 
runner quickly finish processing.

eg:
**PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND**
1165672 ixxxxxmi+  20   0    1.0g   0.1g   0.0g R  _**99.7**_   0.1   0:43.93 
airflow task supervisor: ['airflow', 'tasks', 'run', 'xxx_ingestion_ppp', 
'staging_to_refined_xxx_PRELOGIN_xxxx_sa+

With this when load is moderately high 20 odd tasks together,
further tasks started loosing heartbeat for long duration, this has been seen 
in pre-task logs section of task.

2026-02-11, 15:41:30 IST] {local_task_job_runner.py:123} ▼ Pre task execution 
logs
[2026-02-11, 15:42:16 IST] {taskinstance.py:2631} INFO - Dependencies all met 
for dep_context=non-requeueable deps ti=<TaskInstance: 
xxx_ingestion.staging_to_refined__prod_t_user 
scheduled__2026-02-09T22:00:00+00:00 [queued]>
[2026-02-11, 15:42:16 IST] {taskinstance.py:2631} INFO - Dependencies all met 
for dep_context=requeueable deps ti=<TaskInstance: 
_ingestionet.staging_to_refined__prod_t_user 
scheduled__2026-02-09T22:00:00+00:00 [queued]>
[2026-02-11, 15:42:16 IST] {taskinstance.py:2884} INFO - Starting attempt 2 of 2
[2026-02-11, 15:42:16 IST] {taskinstance.py:2907} INFO - Executing 
<Task(SSHOperator): staging_to_refined_prod_t_user> on 2026-02-09 22:00:00+00:00
[2026-02-11, 15:42:16 IST] {standard_task_runner.py:72} INFO - Started process 
45475 to run task
[2026-02-11, 15:42:16 IST] {standard_task_runner.py:104} INFO - Running: 
['airflow', 'tasks', 'run', 'xxx_ingestion_xxx', 
'staging_to_refinebr_prod_t_user', 'scheduled__2026-02-09T22:00:00+00:00', 
'--job-id', '117940', '--raw', '--subdir', 'DAGS_FOLDER/_ingestion.py', 
'--cfg-path', '/tmp/tmpllomvg1s']
[2026-02-11, 15:42:16 IST] {standard_task_runner.py:105} INFO - Job 117940: 
Subtask staging_to_refinedt_prod_t_user
[2026-02-11, 15:46:54 IST] {task_command.py:467} INFO - Running <TaskInstance: 
_ingestion.staging_to_refined_prod_t_user scheduled__2026-02-09T22:00:00+00:00 
[running]> on host appprr09.idfcbank.com
[2026-02-11, 15:51:11 IST] {job.py:229} INFO - Heartbeat recovered after 784.55 
seconds
[2026-02-11, 15:53:45 IST] {job.py:229} INFO - Heartbeat recovered after 153.75 
seconds
[2026-02-11, 15:58:20 IST] {taskinstance.py:3157} INFO - Exporting env vars: 
AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='_ingestion_' 
AIRFLOW_CTX_TASK_ID='staging_to_refined__prod_t_user' 
AIRFLOW_CTX_EXECUTION_DATE='2026-02-09T22:00:00+00:00' 
AIRFLOW_CTX_TRY_NUMBER='2' 
AIRFLOW_CTX_DAG_RUN_ID='scheduled__2026-02-09T22:00:00+00:00'
[2026-02-11, 15:58:20 IST] {taskinstance.py:740} ▲▲▲ Log group end


Requesting assistance and guidance on same.
airflow v2.2.5, use to handle  more than 50 task with same configuration.


GitHub link: https://github.com/apache/airflow/discussions/62065

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to