We've been plagued by this as well and it prevents us from setting stricter
retry limits. Similar setup, but using MySQL. Also seeing it more for long
running tasks (sensors).

-Daniel

On Wed, Feb 13, 2019 at 1:48 PM James Meickle
<[email protected]> wrote:

> In some cases this is a double execute in Celery. Two workers grab the same
> task, but the first one to update the metadata db to "running" is the only
> one allowed to run. In our case this leads to confusing, but ultimately not
> incorrect, behavior: the failed task writes a log file and makes that
> available, but the other task is still running on another instance, and
> eventually succeeds.
>
> On Wed, Feb 13, 2019 at 4:16 PM Kevin Lam <[email protected]> wrote:
>
> > Friendly ping on the above! Has anyone encountered this by chance?
> >
> > We're still seeing it occasionally on longer running tasks.
> >
> > On Tue, Nov 20, 2018 at 10:31 AM Kevin Lam <[email protected]>
> wrote:
> >
> > > Hi,
> > >
> > > We run Apache Airflow in Kubernetes in a manner very similar to what is
> > > outlined in puckel/docker-airflow [1] (Celery Executor, Redis for
> > > messaging, Postgres).
> > >
> > > Lately, we've encountered some of our Tasks getting stuck in a running
> > > state, and printing out the errors:
> > >
> > > [2018-11-20 05:31:23,009] {models.py:1329} INFO - Dependencies not met
> > for <TaskInstance: BLAH 2018-11-19T19:19:50.757184+00:00 [running]>,
> > dependency 'Task Instance Not Already Running' FAILED: Task is already
> > running, it started on 2018-11-19 23:29:11.974497+00:00.
> > >> [2018-11-20 05:31:23,016] {models.py:1329} INFO - Dependencies not met
> > for <TaskInstance: BLAH 2018-11-19T19:19:50.757184+00:00 [running]>,
> > dependency 'Task Instance State' FAILED: Task is in the 'running' state
> > which is not a valid state for execution. The task must be cleared in
> order
> > to be run.
> > >>
> > >>
> > > Is there anyway to avoid this? Does anyone know what causes this issue?
> > >
> > > This is quite problematic. The task is stuck in running state without
> > > making any progress when the above error occurs, and so turning on
> > retries
> > > on doesn't help with getting our DAGs to reliably run to completion.
> > >
> > > Thanks!
> > >
> > > [1] https://github.com/puckel/docker-airflow
> > >
> >
>

Reply via email to