alkismavridis opened a new issue, #56045:
URL: https://github.com/apache/airflow/issues/56045

   ### Apache Airflow version
   
   3.0.6
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   We run airflow 3.0.6 within a Docker container.
   
   
   From time to time (once per 2-3 days) our scheduler stops to work. The 
health check command seems to be fine:
   
   ```
   default@1a4707d0c867:/opt/airflow$ airflow jobs check --job-type 
SchedulerJob --local
   Found one alive job.
   ```
   
   But in fact, the scheduler seems to be stuck. Multiple Tasks seem to stuck 
in "Queued" state and nothing gets started. The whole infrastructure comes to a 
halt.
   
   Scheduler logs repeatidly the following statement:
   
   
   ```
   [2025-09-24T12:37:05.471+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:06.536+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:07.589+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:08.652+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:09.717+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:10.777+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:11.775+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:12.836+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:13.885+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:14.939+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:15.998+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:17.067+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:17.347+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:18.415+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:19.469+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:20.523+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:21.575+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:21.829+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:21.963+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:21.984+0000] {scheduler_job_runner.py:2218} INFO - 
Adopting or resetting orphaned tasks for active dag runs
   [2025-09-24T12:37:23.014+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:24.076+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:25.127+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:26.171+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:27.217+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:28.260+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:29.308+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:30.055+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:31.104+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:31.875+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:32.394+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:33.438+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:34.495+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   [2025-09-24T12:37:35.547+0000] {dagrun.py:1039} INFO - Found span_status 
'should_end', while updating state for dag_run 
'scheduled__2025-09-23T23:45:00+00:00'
   ```
   
   Please note that the scheduler seems to use all available memory 
(https://github.com/apache/airflow/issues/55768) so this might be related.
   
   
   
   ### What you think should happen instead?
   
   Scheduler works and triggers tasks. If scheduler is not healthy, then 
`airflow jobs check --job-type SchedulerJob --local` should at least be 
informative enough and fail.
   
   ### How to reproduce
   
   Unfortunately we are unsure. Section for the scheduler:
   ```yaml
     airflow-scheduler:
       <<: *airflow-common
       container_name: airflow-scheduler-test
       command: scheduler
       healthcheck:
         test: ["CMD-SHELL", "airflow jobs check --job-type SchedulerJob 
--local || bash -c 'kill -s 15 -1 && (sleep 10; kill -s 9 -1)'"]
         interval: 30s
         timeout: 10s
         retries: 5
         start_period: 150s
       restart: always
       depends_on:
         <<: *airflow-common-depends-on
         airflow-init:
           condition: service_completed_successfully
       mem_limit: 4000m
   ```
   
   ### Operating System
   
   Linux (+docker)
   
   ### Versions of Apache Airflow Providers
   
   ```
   apache-airflow-providers-amazon==9.12.0
   apache-airflow-providers-celery==3.12.2
   apache-airflow-providers-cncf-kubernetes==10.7.0
   apache-airflow-providers-common-compat==1.7.3
   apache-airflow-providers-common-io==1.6.2
   apache-airflow-providers-common-messaging==1.0.5
   apache-airflow-providers-common-sql==1.27.5
   apache-airflow-providers-docker==4.4.2
   apache-airflow-providers-elasticsearch==6.3.2
   apache-airflow-providers-fab==2.4.1
   apache-airflow-providers-ftp==3.13.2
   apache-airflow-providers-git==0.0.6
   apache-airflow-providers-google==17.1.0
   apache-airflow-providers-grpc==3.8.2
   apache-airflow-providers-hashicorp==4.3.2
   apache-airflow-providers-http==5.3.3
   apache-airflow-providers-microsoft-azure==12.6.1
   apache-airflow-providers-mysql==6.3.3
   apache-airflow-providers-odbc==4.10.2
   apache-airflow-providers-openlineage==2.6.1
   apache-airflow-providers-postgres==6.2.3
   apache-airflow-providers-redis==4.2.0
   apache-airflow-providers-sendgrid==4.1.3
   apache-airflow-providers-sftp==5.3.4
   apache-airflow-providers-slack==9.1.4
   apache-airflow-providers-smtp==2.2.0
   apache-airflow-providers-snowflake==6.4.0
   apache-airflow-providers-ssh==4.1.3
   apache-airflow-providers-standard==1.6.0
   ```
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   Docker-Compose.
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to