In some cases this is a double execute in Celery. Two workers grab the same task, but the first one to update the metadata db to "running" is the only one allowed to run. In our case this leads to confusing, but ultimately not incorrect, behavior: the failed task writes a log file and makes that available, but the other task is still running on another instance, and eventually succeeds.
On Wed, Feb 13, 2019 at 4:16 PM Kevin Lam <[email protected]> wrote: > Friendly ping on the above! Has anyone encountered this by chance? > > We're still seeing it occasionally on longer running tasks. > > On Tue, Nov 20, 2018 at 10:31 AM Kevin Lam <[email protected]> wrote: > > > Hi, > > > > We run Apache Airflow in Kubernetes in a manner very similar to what is > > outlined in puckel/docker-airflow [1] (Celery Executor, Redis for > > messaging, Postgres). > > > > Lately, we've encountered some of our Tasks getting stuck in a running > > state, and printing out the errors: > > > > [2018-11-20 05:31:23,009] {models.py:1329} INFO - Dependencies not met > for <TaskInstance: BLAH 2018-11-19T19:19:50.757184+00:00 [running]>, > dependency 'Task Instance Not Already Running' FAILED: Task is already > running, it started on 2018-11-19 23:29:11.974497+00:00. > >> [2018-11-20 05:31:23,016] {models.py:1329} INFO - Dependencies not met > for <TaskInstance: BLAH 2018-11-19T19:19:50.757184+00:00 [running]>, > dependency 'Task Instance State' FAILED: Task is in the 'running' state > which is not a valid state for execution. The task must be cleared in order > to be run. > >> > >> > > Is there anyway to avoid this? Does anyone know what causes this issue? > > > > This is quite problematic. The task is stuck in running state without > > making any progress when the above error occurs, and so turning on > retries > > on doesn't help with getting our DAGs to reliably run to completion. > > > > Thanks! > > > > [1] https://github.com/puckel/docker-airflow > > >
