[ https://issues.apache.org/jira/browse/AIRFLOW-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sajid Sajid reassigned AIRFLOW-5881: ------------------------------------ Assignee: Sajid Sajid > Dag gets stuck in "Scheduled" State when scheduling a large number of tasks > --------------------------------------------------------------------------- > > Key: AIRFLOW-5881 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5881 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler > Affects Versions: 1.10.6 > Reporter: David Hartig > Assignee: Sajid Sajid > Priority: Critical > Attachments: 2 (1).log, airflow.cnf > > > Running with the KubernetesExecutor in and AKS cluster, when we upgraded to > version 1.10.6 we noticed that the all the Dags stop making progress but > start running and immediate exiting with the following message: > "Instance State' FAILED: Task is in the 'scheduled' state which is not a > valid state for execution. The task must be cleared in order to be run." > See attached log file for the worker. Nothing seems out of the ordinary in > the Scheduler log. > Reverting to 1.10.5 clears the problem. > Note that at the time of the failure maybe 100 or so tasks are in this state, > with 70 coming from one highly parallelized dag. Clearing the scheduled tasks > just makes them reappear shortly thereafter. Marking them "up_for_retry" > results in one being executed but then the system is stuck in the original > zombie state. > Attached is the also a redacted airflow config flag. -- This message was sent by Atlassian Jira (v8.20.1#820001)