GitHub user mandar14 created a discussion: Airflow[v2.11.0] Celery worker CPU
starvation
We have Airflow v2.11.0 setup on on-prem linux(RHEL).
VM-1: Webserver, Scheduler, RabbitMQ, PostgreSQL
VM-2,3: Celary workers. (16vCPUs each)
Note: Previously we were using v2.2.5 and setup new separate cluster with above
configuration.
On new setup(v2.11.0) we have been seeing CPU starvation on worker nodes.
Newly introduced task supervisor is holding CPU thread for long time while task
runner quickly finish processing.
eg:
**PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND**
1165672 ixxxxxmi+ 20 0 1.0g 0.1g 0.0g R _**99.7**_ 0.1 0:43.93
airflow task supervisor: ['airflow', 'tasks', 'run', 'xxx_ingestion_ppp',
'staging_to_refined_xxx_PRELOGIN_xxxx_sa+
With this when load is moderately high 20 odd tasks together,
further tasks started loosing heartbeat for long duration, this has been seen
in pre-task logs section of task.
2026-02-11, 15:41:30 IST] {local_task_job_runner.py:123} ▼ Pre task execution
logs
[2026-02-11, 15:42:16 IST] {taskinstance.py:2631} INFO - Dependencies all met
for dep_context=non-requeueable deps ti=<TaskInstance:
xxx_ingestion.staging_to_refined__prod_t_user
scheduled__2026-02-09T22:00:00+00:00 [queued]>
[2026-02-11, 15:42:16 IST] {taskinstance.py:2631} INFO - Dependencies all met
for dep_context=requeueable deps ti=<TaskInstance:
_ingestionet.staging_to_refined__prod_t_user
scheduled__2026-02-09T22:00:00+00:00 [queued]>
[2026-02-11, 15:42:16 IST] {taskinstance.py:2884} INFO - Starting attempt 2 of 2
[2026-02-11, 15:42:16 IST] {taskinstance.py:2907} INFO - Executing
<Task(SSHOperator): staging_to_refined_prod_t_user> on 2026-02-09 22:00:00+00:00
[2026-02-11, 15:42:16 IST] {standard_task_runner.py:72} INFO - Started process
45475 to run task
[2026-02-11, 15:42:16 IST] {standard_task_runner.py:104} INFO - Running:
['airflow', 'tasks', 'run', 'xxx_ingestion_xxx',
'staging_to_refinebr_prod_t_user', 'scheduled__2026-02-09T22:00:00+00:00',
'--job-id', '117940', '--raw', '--subdir', 'DAGS_FOLDER/_ingestion.py',
'--cfg-path', '/tmp/tmpllomvg1s']
[2026-02-11, 15:42:16 IST] {standard_task_runner.py:105} INFO - Job 117940:
Subtask staging_to_refinedt_prod_t_user
[2026-02-11, 15:46:54 IST] {task_command.py:467} INFO - Running <TaskInstance:
_ingestion.staging_to_refined_prod_t_user scheduled__2026-02-09T22:00:00+00:00
[running]> on host appprr09.idfcbank.com
[2026-02-11, 15:51:11 IST] {job.py:229} INFO - Heartbeat recovered after 784.55
seconds
[2026-02-11, 15:53:45 IST] {job.py:229} INFO - Heartbeat recovered after 153.75
seconds
[2026-02-11, 15:58:20 IST] {taskinstance.py:3157} INFO - Exporting env vars:
AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='_ingestion_'
AIRFLOW_CTX_TASK_ID='staging_to_refined__prod_t_user'
AIRFLOW_CTX_EXECUTION_DATE='2026-02-09T22:00:00+00:00'
AIRFLOW_CTX_TRY_NUMBER='2'
AIRFLOW_CTX_DAG_RUN_ID='scheduled__2026-02-09T22:00:00+00:00'
[2026-02-11, 15:58:20 IST] {taskinstance.py:740} ▲▲▲ Log group end
Requesting assistance and guidance on same.
airflow v2.2.5, use to handle more than 50 task with same configuration.
GitHub link: https://github.com/apache/airflow/discussions/62065
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]