Hi Vardan,

We had this issue - I recommend increasing the parallelism config variable
to something like 128 or 512. I have no idea what side effects this could
have. So far, none. This happened to us with LocalExecutor and our
monitoring showed a clear issue with hitting a cap on number of concurrent
tasks tasks. I probably should have reported it, but we still aren't sure
what happened and have not investigated why those tasks are not getting
kicked back up into the queue or whatever.

You may need to increase other config variables, too, if they also cause
you to hit caps. Some people are conservative about these variables. If you
are feeling conservative, you can get some better telemetry into this with
prometheus and grafana. We followed this route but resolved to just set the
cap very high and resolve any side effects afterwards.

Best,
Trent


On Mon, Aug 27, 2018 at 21:09 vardangupta...@gmail.com <
vardangupta...@gmail.com> wrote:

> Hi Everyone,
>
> Since last 2 weeks, we're facing an issue with LocalExecutor setup of
> Airflow v1.9(MySQL as metastore) where in a DAG if retry has been
> configured and initial try_number gets failed, then nearly 8 out of 10
> times, task will get stuck in up_for_retry state, in fact there is no
> running state coming after Scheduled>Queued in TI. In Job table entry gets
> successful within fraction of second and failed entry gets logged in
> task_fail table without task even reaching to operator code and as a result
> we get aemail alert saying
>
> ```
> Try 2 out of 4
> Exception:
> Executor reports task instance %s finished (%s) although the task says its
> %s. Was the task killed externally?
> ```
>
> But when default value of job_heartbeat_sec changed from 5 to 30 seconds(
> https://groups.google.com/forum/#!topic/airbnb_airflow/hTXKFw2XFx0
> mentioned by Max sometimes back in 2016 for healthy supervision), this
> issue stops arising. But we're still clueless how this new configuration
> actually solved/suppressed the issue, any key information around it would
> really help here.
>
> Regards,
> Vardan Gupta
>
-- 
(Sent from cellphone)

Reply via email to