GitHub user rcrchawla created a discussion: Airflow task failed but spark kube 
app is running

### Body

Airflow task got failed where spark kube app is running. Although spark kube 
app is long running app most probably around 1-2 hour. And there are 
concurrently many task running at the same time usually it happens between 
02:30 am - 03:45 am UTC. 

Q) What causing issue ? 
A) Airflow task failed while spark kube app running

Airflow version -- **3.0.4**

Setup config
2 API servers
2 workers
1 dag processor
2 schedulers

Deployment  --> HELM Chart deployment on Azure Kubernetes




Please check below logs


Worker logs : 
-------------------------------------
2026-03-10 02:33:56.191330 [info     ] Task 
execute_workload[8cbabf91-009f-44a6-86d1-bef109c70341] succeeded in 
2715.019189195242s: None [celery.app.trace]
2026-03-10 02:39:57.112078 [info     ] Task finished                  
[supervisor] duration=1723.7576029417105 exit_code=0 final_state=success
2026-03-10 02:39:57.128929 [info     ] Task 
execute_workload[9b3f27ec-09b5-424e-8d5c-412e541f51e8] succeeded in 
1723.8186896019615s: None [celery.app.trace]
2026-03-10 02:40:50.688403 [info     ] Task finished                  
[supervisor] duration=744.0669570546597 exit_code=0 final_state=success
2026-03-10 02:40:50.705538 [info     ] Task 
execute_workload[b08ac31a-2ee7-4029-b897-753157b18475] succeeded in 
744.139388079755s: None [celery.app.trace]
2026-03-10 02:42:11.649891 [info     ] Task finished                  
[supervisor] duration=756.7588595808484 exit_code=0 final_state=success
2026-03-10 02:42:11.666368 [info     ] Task 
execute_workload[0351c271-194e-4e58-87e4-a9c224351ab1] succeeded in 
756.8229349320754s: None [celery.app.trace]
2026-03-10 02:43:37.239128 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:38.119304 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:38.640468 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:39.247588 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:39.425843 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:39.618220 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:40.002999 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:40.582177 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:41.186771 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:41.510710 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:42.658853 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:43.171303 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:43.826966 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:44.330891 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:44.874859 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:44.922591 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:45.866775 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:46.194974 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:46.482845 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:46.750792 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:48.198838 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:48.462121 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:49.749467 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:50.029438 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:50.834835 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:51.334847 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:51.431052 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:51.537615 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:52.567197 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:52.967177 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:53.615078 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:54.513959 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:56.442819 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:57.527549 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:57.765172 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:57.982839 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:58.099625 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:58.534632 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:59.007106 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:43:59.947380 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
2026-03-10 02:44:02.200313 [warning  ] Failed to send heartbeat. Will be 
retried [supervisor] failed_heartbeats=1 max_retries=3 
ti_id=UUID('019cd54c-28b0-7e18-9a7b-71ba469bf545')


API Server 
----------------------

2026-03-10 02:45:23 [debug    ] Retrieved current task state   
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local 
current_pid=152133 state=running ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017
2026-03-10 02:45:23 [debug    ] Retrieved current task state   
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local 
current_pid=155023 state=running ti_id=019cd518-d7c9-7e7e-bde2-efc6322e36a3
2026-03-10 02:45:23 [debug    ] Retrieved current task state   
current_hostname=airflow-worker-0.airflow-worker.de-services.svc.cluster.local 
current_pid=81402 state=running ti_id=019cd578-f8c1-7125-9906-ef64229dbba5
2026-03-10 02:45:23 [debug    ] Retrieved current task state   
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local 
current_pid=157917 state=running ti_id=019cd54c-28ad-7db0-b0f8-d64ed0916d78
2026-03-10 02:45:23 [debug    ] Retrieved current task state   
current_hostname=airflow-worker-0.airflow-worker.de-services.svc.cluster.local 
current_pid=86154 state=running ti_id=019cd542-0d45-75e4-95d5-a2c461e3e559
2026-03-10 02:45:23 [debug    ] Heartbeat updated              state=running 
ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017
INFO:     10.10.12.52:40870 - "GET /api/v2/version HTTP/1.1" 200 OK
INFO:     10.10.12.52:40880 - "GET /api/v2/version HTTP/1.1" 200 OK
2026-03-10 02:45:23 [debug    ] Processing heartbeat           
hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local 
pid=151395 ti_id=019cd542-0d47-7d93-a021-0cc2c9de7344
2026-03-10 02:45:23 [debug    ] Refreshed token issued to Task 
[airflow.api_fastapi.execution_api.deps] refresh_when_less_than=120 
valid_left=73
2026-03-10 02:45:23 [debug    ] Refreshed token issued to Task 
[airflow.api_fastapi.execution_api.deps] refresh_when_less_than=120 
valid_left=73
2026-03-10 02:45:23 [debug    ] Heartbeat updated              state=running 
ti_id=019cd526-91bc-7461-8be3-aa7574c5f60b
2026-03-10 02:45:23 [debug    ] Processing heartbeat           
hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local 
pid=155023 ti_id=019cd518-d7c9-7e7e-bde2-efc6322e36a3
[2026-03-10T02:45:23.575+0000] {exceptions.py:77} ERROR - Error with id 9zBmdizJ
  File 
"/home/airflow/.local/lib/python3.12/site-packages/starlette/_exception_handler.py",
 line 42, in wrapped_app
    await app(scope, receive, sender)
  File 
"/home/airflow/.local/lib/python3.12/site-packages/starlette/routing.py", line 
75, in app
    response = await f(request)
               ^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/fastapi/routing.py", 
line 302, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/fastapi/routing.py", 
line 213, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/airflow/.local/lib/python3.12/site-packages/cadwyn/structure/versions.py",
 line 474, in decorator
    response = await self._convert_endpoint_response_to_version(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/airflow/.local/lib/python3.12/site-packages/cadwyn/structure/versions.py",
 line 520, in _convert_endpoint_response_to_version
    response_or_response_body: Union[FastapiResponse, object] = await 
run_in_threadpool(
                                                                
^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/airflow/.local/lib/python3.12/site-packages/starlette/concurrency.py", 
line 38, in run_in_threadpool
    return await anyio.to_thread.run_sync(func)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/anyio/to_thread.py", 
line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/airflow/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py",
 line 2476, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File 
"/home/airflow/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py",
 line 967, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/airflow/.local/lib/python3.12/site-packages/cadwyn/schema_generation.py",
 line 515, in __call__
    return self._original_callable(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/api_fastapi/execution_api/routes/xcoms.py",
 line 419, in set_xcom
    session.flush()
  File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", 
line 3449, in flush
    self._flush(objects)
  File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", 
line 3588, in _flush
    with util.safe_reraise():
         ^^^^^^^^^^^^^^^^^^^
  File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/util/langhelpers.py",
 line 70, in __exit__
    compat.raise_(
  File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/util/compat.py", 
line 211, in raise_
    raise exception
  File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", 
line 3549, in _flush
    flush_context.execute()
  File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py",
 line 456, in execute
    rec.execute(self)
  File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py",
 line 630, in execute
    util.preloaded.orm_persistence.save_obj(
  File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py",
 line 245, in save_obj
    _emit_insert_statements(
  File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py",
 line 1097, in _emit_insert_statements
    c = connection._execute_20(
        ^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", 
line 1710, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/sql/elements.py", 
line 334, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", 
line 1577, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", 
line 1953, in _execute_context
    self._handle_dbapi_exception(
  File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", 
line 2134, in _handle_dbapi_exception
    util.raise_(
  File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/util/compat.py", 
line 211, in raise_
    raise exception
  File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", 
line 1910, in _execute_context
    self.dialect.do_execute(
  File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/default.py",
 line 736, in do_execute
    cursor.execute(statement, parameters)
  File "/home/airflow/.local/lib/python3.12/site-packages/MySQLdb/cursors.py", 
line 179, in execute
    res = self._query(mogrified_query)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/MySQLdb/cursors.py", 
line 330, in _query
    db.query(q)
  File 
"/home/airflow/.local/lib/python3.12/site-packages/MySQLdb/connections.py", 
line 280, in query
    _mysql.connection.query(self, query)

2026-03-10 02:45:23 [debug    ] Heartbeat updated              state=running 
ti_id=019cd518-d7c9-7e7e-bde2-efc6322e36a3
2026-03-10 02:45:23 [debug    ] Heartbeat updated              state=running 
ti_id=019cd54c-28ad-7db0-b0f8-d64ed0916d78
2026-03-10 02:45:23 [debug    ] Retrieved current task state   
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local 
current_pid=65618 state=running ti_id=019cd526-91bc-7461-8be3-aa7574c5f60b
2026-03-10 02:45:23 [debug    ] Heartbeat updated              state=running 
ti_id=019cd526-91bc-7461-8be3-aa7574c5f60b
2026-03-10 02:45:23 [debug    ] Retrieved current task state   
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local 
current_pid=151858 state=running ti_id=019cd542-0d49-744c-aa72-a33d5ac4249d
2026-03-10 02:45:23 [debug    ] Heartbeat updated              state=running 
ti_id=019cd542-0d45-75e4-95d5-a2c461e3e559
2026-03-10 02:45:23 [debug    ] Heartbeat updated              state=running 
ti_id=019cd542-0d49-744c-aa72-a33d5ac4249d
2026-03-10 02:45:23 [debug    ] Retrieved current task state   
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local 
current_pid=152133 state=running ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017
2026-03-10 02:45:23 [debug    ] Retrieved current task state   
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local 
current_pid=157917 state=running ti_id=019cd54c-28ad-7db0-b0f8-d64ed0916d78
2026-03-10 02:45:23 [debug    ] Heartbeat updated              state=running 
ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017


What you think should happen instead?

Airflow task should run without getting failed.

### Committer

- [x] I acknowledge that I am a maintainer/committer of the Apache Airflow 
project.

GitHub link: https://github.com/apache/airflow/discussions/63298

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to