Hi Vardan, We had this issue - I recommend increasing the parallelism config variable to something like 128 or 512. I have no idea what side effects this could have. So far, none. This happened to us with LocalExecutor and our monitoring showed a clear issue with hitting a cap on number of concurrent tasks tasks. I probably should have reported it, but we still aren't sure what happened and have not investigated why those tasks are not getting kicked back up into the queue or whatever.
You may need to increase other config variables, too, if they also cause you to hit caps. Some people are conservative about these variables. If you are feeling conservative, you can get some better telemetry into this with prometheus and grafana. We followed this route but resolved to just set the cap very high and resolve any side effects afterwards. Best, Trent On Mon, Aug 27, 2018 at 21:09 vardangupta...@gmail.com < vardangupta...@gmail.com> wrote: > Hi Everyone, > > Since last 2 weeks, we're facing an issue with LocalExecutor setup of > Airflow v1.9(MySQL as metastore) where in a DAG if retry has been > configured and initial try_number gets failed, then nearly 8 out of 10 > times, task will get stuck in up_for_retry state, in fact there is no > running state coming after Scheduled>Queued in TI. In Job table entry gets > successful within fraction of second and failed entry gets logged in > task_fail table without task even reaching to operator code and as a result > we get aemail alert saying > > ``` > Try 2 out of 4 > Exception: > Executor reports task instance %s finished (%s) although the task says its > %s. Was the task killed externally? > ``` > > But when default value of job_heartbeat_sec changed from 5 to 30 seconds( > https://groups.google.com/forum/#!topic/airbnb_airflow/hTXKFw2XFx0 > mentioned by Max sometimes back in 2016 for healthy supervision), this > issue stops arising. But we're still clueless how this new configuration > actually solved/suppressed the issue, any key information around it would > really help here. > > Regards, > Vardan Gupta > -- (Sent from cellphone)