[ https://issues.apache.org/jira/browse/AIRFLOW-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617509#comment-16617509 ]
Marcin Szymanski commented on AIRFLOW-3065: ------------------------------------------- Turned out to be caused by AIRFLOW-1104. @commiters can we have the fix included in 1.10.1? > Scheduler failing tasks when DAG concurrency limit reached > ---------------------------------------------------------- > > Key: AIRFLOW-3065 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3065 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler > Affects Versions: 1.10.0 > Reporter: Marcin Szymanski > Priority: Critical > > In a DAG with concurrency limit of 4, with about 150 task inside, when the > limit of active tasks is reached, the scheduler starts to fail queued tasks. > They later are retried, but if they have downstream tasks, these remain in > upstream_failed status. > A few additional details: > * celery executor > * environment upgraded from 1.9 (no issues back then) > * all configuration in airflow.cfg updated to the latest set of options > * issue happens both with PyPi 1.10 and a build from branch v1-10-test > (c36ef06) > > > {noformat} > [2018-09-14 13:51:23,560] {models.py:1336} INFO - Dependencies all met for > <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]> > [2018-09-14 13:51:23,850] {models.py:1330} INFO - Dependencies not met for > <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 > [queued]>, dependency 'Task Instance Slots Available' FAILED: The maximum > number of running tasks (4) for this task's DAG 'consolidated_db' has been > reached. > [2018-09-14 13:51:23,852] {models.py:1531} WARNING - > -------------------------------------------------------------------------------- > FIXME: Rescheduling due to concurrency limits reached at task runtime. > Attempt 1 of 1. State set to NONE. > -------------------------------------------------------------------------------- > [2018-09-14 13:51:23,853] {models.py:1534} INFO - Queuing into pool None > [2018-09-14 13:51:23,560] {models.py:1336} INFO - Dependencies all met for > <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]> > [2018-09-14 13:51:23,850] {models.py:1330} INFO - Dependencies not met for > <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 > [queued]>, dependency 'Task Instance Slots Available' FAILED: The maximum > number of running tasks (4) for this task's DAG 'consolidated_db' has been > reached. > [2018-09-14 13:51:23,852] {models.py:1531} WARNING - > -------------------------------------------------------------------------------- > FIXME: Rescheduling due to concurrency limits reached at task runtime. > Attempt 1 of 1. State set to NONE. > -------------------------------------------------------------------------------- > [2018-09-14 13:51:23,853] {models.py:1534} INFO - Queuing into pool None > [2018-09-14 13:52:49,939] {models.py:1336} INFO - Dependencies all met for > <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]> > [2018-09-14 13:52:50,142] {models.py:1336} INFO - Dependencies all met for > <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]> > [2018-09-14 13:52:50,235] {models.py:1548} INFO - > -------------------------------------------------------------------------------- > Starting attempt 1 of 1 > -------------------------------------------------------------------------------- > [2018-09-14 13:52:50,646] {models.py:1570} INFO - Executing > <Task(PostgresDumpOperator): item> on 2018-09-14T12:42:55.379761+00:00 > {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)