Hi, We run Apache Airflow in Kubernetes in a manner very similar to what is outlined in puckel/docker-airflow [1] (Celery Executor, Redis for messaging, Postgres).
Lately, we've encountered some of our Tasks getting stuck in a running state, and printing out the errors: [2018-11-20 05:31:23,009] {models.py:1329} INFO - Dependencies not met for <TaskInstance: BLAH 2018-11-19T19:19:50.757184+00:00 [running]>, dependency 'Task Instance Not Already Running' FAILED: Task is already running, it started on 2018-11-19 23:29:11.974497+00:00. > [2018-11-20 05:31:23,016] {models.py:1329} INFO - Dependencies not met for > <TaskInstance: BLAH 2018-11-19T19:19:50.757184+00:00 [running]>, dependency > 'Task Instance State' FAILED: Task is in the 'running' state which is not a > valid state for execution. The task must be cleared in order to be run. > > Is there anyway to avoid this? Does anyone know what causes this issue? This is quite problematic. The task is stuck in running state without making any progress when the above error occurs, and so turning on retries on doesn't help with getting our DAGs to reliably run to completion. Thanks! [1] https://github.com/puckel/docker-airflow