lukas-at-harren commented on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-813831995


   @kaxil I have checked, `min_file_process_interval` is set to `30`, however 
the problem is still there for me.
   
   @SalmonTimo I have a pretty high CPU utilisation (60%), albeit the scheduler 
settings are default. But why? Does this matter?
   
   ––
   
   Same issue, new day: I have Airflow running, the scheduler running, but the 
whole cluster has 103 scheduled tasks and 3 queued tasks, but nothing is 
running at all. I highly doubt that `min_file_process_interval` is the root of 
the problem.
   I suggest somebody mark that issue with a higher priority, I do not think 
that "regularly restarting the scheduler" is a reasonable solution.
   
   --
   
   What we need here is some factual inspection of the Python process.
   I am no Python expert, however I am proficient and know myself around in 
other VMs (Erlang, Ruby).
   
   Following that stack trace idea, I just learned that Python cannot dump a 
process (https://stackoverflow.com/a/141826/128351), unfortunately, otherwise I 
would have provided you with such a process dump of my running "scheduler".
   
   I am very happy to provide you with some facts about my stalled scheduler, 
if you tell me how you would debug such an issue.
   
   What I currently have:
   
   * CPU utilisation of the scheduler is still pretty high (around 60%).
   * `AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL` is set to `30`
   * `AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL` is set to `10`
   * Log output of scheduler:
   
   ```
   [2021-04-06 05:19:56,201] {scheduler_job.py:1063} INFO - Setting the 
following tasks to queued state:
   
   [2021-04-06 05:19:57,865] {scheduler_job.py:941} INFO - 15 tasks up for 
execution:
   
   # ... snip ...
   
   [2021-04-06 05:19:57,876] {scheduler_job.py:975} INFO - Figuring out tasks 
to run in Pool(name=mssql_dwh) with 0 open slots and 15 task instances ready to 
be queued
   [2021-04-06 05:19:57,882] {scheduler_job.py:985} INFO - Not scheduling since 
there are 0 open slots in pool mssql_dwh
   ```
   
   What I find striking, is the message `INFO - Not scheduling since there are 
0 open slots in pool mssql_dwh`.
   That is a pool configured for max 3 slots. However no single task is 
running. Bluntly, I guess there is a bug with the stack:
   
   * KubernetesExecutor
   * Pools
   
   I fear bug is, that the "scheduler" might be loosing track of running tasks 
on Kubernetes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to