lukas-at-harren commented on issue #13542: URL: https://github.com/apache/airflow/issues/13542#issuecomment-813831995
@kaxil I have checked, `min_file_process_interval` is set to `30`, however the problem is still there for me. @SalmonTimo I have a pretty high CPU utilisation (60%), albeit the scheduler settings are default. But why? Does this matter? –– Same issue, new day: I have Airflow running, the scheduler running, but the whole cluster has 103 scheduled tasks and 3 queued tasks, but nothing is running at all. I highly doubt that `min_file_process_interval` is the root of the problem. I suggest somebody mark that issue with a higher priority, I do not think that "regularly restarting the scheduler" is a reasonable solution. -- What we need here is some factual inspection of the Python process. I am no Python expert, however I am proficient and know myself around in other VMs (Erlang, Ruby). Following that stack trace idea, I just learned that Python cannot dump a process (https://stackoverflow.com/a/141826/128351), unfortunately, otherwise I would have provided you with such a process dump of my running "scheduler". I am very happy to provide you with some facts about my stalled scheduler, if you tell me how you would debug such an issue. What I currently have: * CPU utilisation of the scheduler is still pretty high (around 60%). * `AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL` is set to `30` * `AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL` is set to `10` * Log output of scheduler: ``` [2021-04-06 05:19:56,201] {scheduler_job.py:1063} INFO - Setting the following tasks to queued state: [2021-04-06 05:19:57,865] {scheduler_job.py:941} INFO - 15 tasks up for execution: # ... snip ... [2021-04-06 05:19:57,876] {scheduler_job.py:975} INFO - Figuring out tasks to run in Pool(name=mssql_dwh) with 0 open slots and 15 task instances ready to be queued [2021-04-06 05:19:57,882] {scheduler_job.py:985} INFO - Not scheduling since there are 0 open slots in pool mssql_dwh ``` What I find striking, is the message `INFO - Not scheduling since there are 0 open slots in pool mssql_dwh`. That is a pool configured for max 3 slots. However no single task is running. Bluntly, I guess there is a bug with the stack: * KubernetesExecutor * Pools I fear bug is, that the "scheduler" might be loosing track of running tasks on Kubernetes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org