GitHub user hezeclark added a comment to the discussion: Airflow task failed
but spark kube app is running
This is a common issue with Airflow + SparkKubernetesOperator when the Airflow
task timeout is shorter than the actual Spark job duration.
**Root cause**: Airflow marks the task as failed when it doesn't receive a
heartbeat or when the task's `execution_timeout` is exceeded, but the Spark
application keeps running in Kubernetes independently — it has no awareness of
Airflow's state.
**Solutions:**
**1. Increase `execution_timeout` on the Spark task**
```python
from airflow.providers.cncf.kubernetes.operators.spark_kubernetes import
SparkKubernetesOperator
from datetime import timedelta
submit_job = SparkKubernetesOperator(
task_id='submit_spark_job',
application='/path/to/spark-app.yaml',
execution_timeout=timedelta(hours=3), # longer than max expected job
duration
...
)
```
**2. Use `SparkKubernetesSensor` to poll instead of waiting synchronously**
```python
from airflow.providers.cncf.kubernetes.sensors.spark_kubernetes import
SparkKubernetesSensor
monitor_job = SparkKubernetesSensor(
task_id='monitor_spark_job',
application_name='{{ task_instance.xcom_pull(task_ids=\"submit_spark_job\")
}}',
poke_interval=30,
timeout=7200, # 2 hours
...
)
```
**3. Add a cleanup DAG / task that deletes orphaned Spark apps**
When Airflow fails but Spark keeps running, you need to handle the orphaned
app. Add an `on_failure_callback` that calls `kubectl delete sparkapplication
<name>` to prevent resource leaks:
```python
def cleanup_spark_app(context):
import subprocess
app_name = context['task_instance'].xcom_pull(task_ids='submit_spark_job')
subprocess.run(['kubectl', 'delete', 'sparkapplication', app_name, '-n',
'spark'], check=False)
submit_job = SparkKubernetesOperator(
on_failure_callback=cleanup_spark_app,
...
)
```
GitHub link:
https://github.com/apache/airflow/discussions/63298#discussioncomment-16117745
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]