alkismavridis opened a new issue, #56045:
URL: https://github.com/apache/airflow/issues/56045
### Apache Airflow version
3.0.6
### If "Other Airflow 2 version" selected, which one?
_No response_
### What happened?
We run airflow 3.0.6 within a Docker container.
From time to time (once per 2-3 days) our scheduler stops to work. The
health check command seems to be fine:
```
default@1a4707d0c867:/opt/airflow$ airflow jobs check --job-type
SchedulerJob --local
Found one alive job.
```
But in fact, the scheduler seems to be stuck. Multiple Tasks seem to stuck
in "Queued" state and nothing gets started. The whole infrastructure comes to a
halt.
Scheduler logs repeatidly the following statement:
```
[2025-09-24T12:37:05.471+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:06.536+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:07.589+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:08.652+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:09.717+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:10.777+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:11.775+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:12.836+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:13.885+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:14.939+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:15.998+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:17.067+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:17.347+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:18.415+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:19.469+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:20.523+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:21.575+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:21.829+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:21.963+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:21.984+0000] {scheduler_job_runner.py:2218} INFO -
Adopting or resetting orphaned tasks for active dag runs
[2025-09-24T12:37:23.014+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:24.076+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:25.127+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:26.171+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:27.217+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:28.260+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:29.308+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:30.055+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:31.104+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:31.875+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:32.394+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:33.438+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:34.495+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
[2025-09-24T12:37:35.547+0000] {dagrun.py:1039} INFO - Found span_status
'should_end', while updating state for dag_run
'scheduled__2025-09-23T23:45:00+00:00'
```
Please note that the scheduler seems to use all available memory
(https://github.com/apache/airflow/issues/55768) so this might be related.
### What you think should happen instead?
Scheduler works and triggers tasks. If scheduler is not healthy, then
`airflow jobs check --job-type SchedulerJob --local` should at least be
informative enough and fail.
### How to reproduce
Unfortunately we are unsure. Section for the scheduler:
```yaml
airflow-scheduler:
<<: *airflow-common
container_name: airflow-scheduler-test
command: scheduler
healthcheck:
test: ["CMD-SHELL", "airflow jobs check --job-type SchedulerJob
--local || bash -c 'kill -s 15 -1 && (sleep 10; kill -s 9 -1)'"]
interval: 30s
timeout: 10s
retries: 5
start_period: 150s
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
mem_limit: 4000m
```
### Operating System
Linux (+docker)
### Versions of Apache Airflow Providers
```
apache-airflow-providers-amazon==9.12.0
apache-airflow-providers-celery==3.12.2
apache-airflow-providers-cncf-kubernetes==10.7.0
apache-airflow-providers-common-compat==1.7.3
apache-airflow-providers-common-io==1.6.2
apache-airflow-providers-common-messaging==1.0.5
apache-airflow-providers-common-sql==1.27.5
apache-airflow-providers-docker==4.4.2
apache-airflow-providers-elasticsearch==6.3.2
apache-airflow-providers-fab==2.4.1
apache-airflow-providers-ftp==3.13.2
apache-airflow-providers-git==0.0.6
apache-airflow-providers-google==17.1.0
apache-airflow-providers-grpc==3.8.2
apache-airflow-providers-hashicorp==4.3.2
apache-airflow-providers-http==5.3.3
apache-airflow-providers-microsoft-azure==12.6.1
apache-airflow-providers-mysql==6.3.3
apache-airflow-providers-odbc==4.10.2
apache-airflow-providers-openlineage==2.6.1
apache-airflow-providers-postgres==6.2.3
apache-airflow-providers-redis==4.2.0
apache-airflow-providers-sendgrid==4.1.3
apache-airflow-providers-sftp==5.3.4
apache-airflow-providers-slack==9.1.4
apache-airflow-providers-smtp==2.2.0
apache-airflow-providers-snowflake==6.4.0
apache-airflow-providers-ssh==4.1.3
apache-airflow-providers-standard==1.6.0
```
### Deployment
Docker-Compose
### Deployment details
Docker-Compose.
### Anything else?
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]