val-lavrentiev opened a new issue, #51569:
URL: https://github.com/apache/airflow/issues/51569
### Apache Airflow version
3.0.1
### If "Other Airflow 2 version" selected, which one?
_No response_
### What happened?
A failed task instead of being restarted stays forever in the "queued"
state. Scheduler according to the logs tries to schedule it only once (for some
unknown reason):
```
Jun 10 10:51:26 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:51:26.588+0000] {_client.py:1026} INFO - HTTP Request: PUT
https://airflow-server.com/execution/task-instances/01975940-1bb5-7690-a48f-7b09540f8373/heartbeat
"HTTP/1.1 204 No Cont>
Jun 10 10:51:31 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:51:31.498+0000] {scheduler_job_runner.py:2128} INFO - Adopting
or resetting orphaned tasks for active dag runs
Jun 10 10:51:31 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:51:31.614+0000] {_client.py:1026} INFO - HTTP Request: PUT
https://airflow-server.com/execution/task-instances/01975940-1bb5-7690-a48f-7b09540f8373/heartbeat
"HTTP/1.1 204 No Cont>
Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:51:36.795+0000] {_client.py:1026} INFO - HTTP Request: PUT
https://airflow-server.com/execution/task-instances/01975940-1bb5-7690-a48f-7b09540f8373/heartbeat
"HTTP/1.1 204 No Cont>
Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler | 2025-06-10
10:51:36 [debug ] Received message from task runner [supervisor]
msg=RetryTask(state='up_for_retry', end_date=datetime.datetime(2025, 6, 10, 10,
51, 36, 773750, tzinfo=TzInfo(UTC)), rendered_m>
Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:51:36.839+0000] {_client.py:1026} INFO - HTTP Request: PATCH
https://airflow-server.com/execution/task-instances/01975940-1bb5-7690-a48f-7b09540f8373/state
"HTTP/1.1 204 No Conten>
Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler | 2025-06-10
10:51:36 [debug ] Received message from task runner [supervisor]
msg=SetRenderedFields(rendered_fields={'op_args': [], 'op_kwargs': {},
'bash_command': 'direnv allow /data/ephemeral/airflow-ap>
Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:51:36.857+0000] {_client.py:1026} INFO - HTTP Request: PUT
https://airflow-server.com/execution/task-instances/01975940-1bb5-7690-a48f-7b09540f8373/rtif
"HTTP/1.1 404 Not Found"
Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler | 2025-06-10
10:51:36 [warning ] Server error [airflow.sdk.api.client]
detail=None
Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler | 2025-06-10
10:51:36 [error ] API server error [supervisor]
detail={'detail': 'Not Found'} message='Not Found' status_code=404
Jun 10 10:51:51 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:51:51.976+0000] {dag.py:2509} INFO - Setting next_dagrun for
content_classifier_evaluation to 2025-06-09 00:00:00+00:00,
run_after=2025-06-11 00:00:00+00:00
Jun 10 10:52:22 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:52:22.126+0000] {dag.py:2509} INFO - Setting next_dagrun for
content_classifier_evaluation to 2025-06-09 00:00:00+00:00,
run_after=2025-06-11 00:00:00+00:00
Jun 10 10:52:52 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:52:52.272+0000] {dag.py:2509} INFO - Setting next_dagrun for
content_classifier_evaluation to 2025-06-09 00:00:00+00:00,
run_after=2025-06-11 00:00:00+00:00
Jun 10 10:53:23 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:53:23.490+0000] {dag.py:2509} INFO - Setting next_dagrun for
content_classifier_evaluation to 2025-06-09 00:00:00+00:00,
run_after=2025-06-11 00:00:00+00:00
Jun 10 10:53:53 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:53:53.628+0000] {dag.py:2509} INFO - Setting next_dagrun for
content_classifier_evaluation to 2025-06-09 00:00:00+00:00,
run_after=2025-06-11 00:00:00+00:00
Jun 10 10:54:24 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:54:24.826+0000] {dag.py:2509} INFO - Setting next_dagrun for
content_classifier_evaluation to 2025-06-09 00:00:00+00:00,
run_after=2025-06-11 00:00:00+00:00
Jun 10 10:54:54 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:54:54.965+0000] {dag.py:2509} INFO - Setting next_dagrun for
content_classifier_evaluation to 2025-06-09 00:00:00+00:00,
run_after=2025-06-11 00:00:00+00:00
Jun 10 10:55:26 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:55:26.180+0000] {dag.py:2509} INFO - Setting next_dagrun for
content_classifier_evaluation to 2025-06-09 00:00:00+00:00,
run_after=2025-06-11 00:00:00+00:00
Jun 10 10:55:57 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:55:57.145+0000] {dag.py:2509} INFO - Setting next_dagrun for
content_classifier_evaluation to 2025-06-09 00:00:00+00:00,
run_after=2025-06-11 00:00:00+00:00
Jun 10 10:56:27 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:56:27.533+0000] {dag.py:2509} INFO - Setting next_dagrun for
content_classifier_evaluation to 2025-06-09 00:00:00+00:00,
run_after=2025-06-11 00:00:00+00:00
Jun 10 10:56:31 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:56:31.570+0000] {scheduler_job_runner.py:2128} INFO - Adopting
or resetting orphaned tasks for active dag runs
Jun 10 10:56:49 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:56:49.831+0000] {scheduler_job_runner.py:450} INFO - 1 tasks up
for execution:
Jun 10 10:56:49 ip-10-0-0-77 start.sh[1369546]: scheduler |
<TaskInstance:
cdp_profiles_send_to_dest.run_export_json_and_send_to_destination
scheduled__2025-06-09T07:00:00+00:00 [scheduled]>
Jun 10 10:56:49 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:56:49.831+0000] {scheduler_job_runner.py:522} INFO - DAG
cdp_profiles_send_to_dest has 0/16 running and queued tasks
Jun 10 10:56:49 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:56:49.832+0000] {scheduler_job_runner.py:661} INFO - Setting the
following tasks to queued state:
Jun 10 10:56:49 ip-10-0-0-77 start.sh[1369546]: scheduler |
<TaskInstance:
cdp_profiles_send_to_dest.run_export_json_and_send_to_destination
scheduled__2025-06-09T07:00:00+00:00 [scheduled]>
Jun 10 10:56:49 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:56:49.834+0000] {scheduler_job_runner.py:767} INFO - Trying to
enqueue tasks: [<TaskInstance:
cdp_profiles_send_to_dest.run_export_json_and_send_to_destination
scheduled__2025-06-09T07:00:00+0>
Jun 10 10:56:58 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:56:58.093+0000] {dag.py:2509} INFO - Setting next_dagrun for
content_classifier_evaluation to 2025-06-09 00:00:00+00:00,
run_after=2025-06-11 00:00:00+00:00
Jun 10 10:57:29 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:57:29.518+0000] {dag.py:2509} INFO - Setting next_dagrun for
content_classifier_evaluation to 2025-06-09 00:00:00+00:00,
run_after=2025-06-11 00:00:00+00:00
Jun 10 10:57:59 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:57:59.976+0000] {dag.py:2509} INFO - Setting next_dagrun for
content_classifier_evaluation to 2025-06-09 00:00:00+00:00,
run_after=2025-06-11 00:00:00+00:00
Jun 10 10:58:31 ip-10-0-0-77 start.sh[1369546]: scheduler |
[2025-06-10T10:58:31.508+0000] {dag.py:2509} INFO - Setting next_dagrun for
content_classifier_evaluation to 2025-06-09 00:00:00+00:00,
run_after=2025-06-11 00:00:00+00:00
```
### What you think should happen instead?
After a fail a task is restarted.
### How to reproduce
We use default parameter apart from:
```
export AIRFLOW_HOME={{ airflow_v2_home }}
export AIRFLOW__API__BASE_URL=https://airflow-server.com
export AIRFLOW__CORE__EXECUTOR=LocalExecutor
export AIRFLOW__CORE__LOAD_EXAMPLES=false
export AIRFLOW__CORE__PARALLELISM=32
export AIRFLOW__CORE__SIMPLE_AUTH_MANAGER_ALL_ADMINS=True
export AIRFLOW__CORE__DAGS_FOLDER=/apps/airflow/dags
export
AIRFLOW__SECRETS__BACKEND=airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend
export AIRFLOW__SECRETS__BACKEND_KWARGS='{"region_name": "eu-west-1",
"connections_prefix": "", "variables_prefix": "", "config_prefix": ""}'
export AIRFLOW__SCHEDULER__CREATE_CRON_DATA_INTERVALS=True
export AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql://{{
airflow_v2_postgres_user }}:{{ airflow_v2_postgres_password }}@{{
airflow_v2_postgres_host }}/{{ airflow_v2_postgres_database }}
# workaround for airflow v3.0.1 to avoid scheduler jwt expiration errors
export AIRFLOW__API_AUTH__JWT_CLI_EXPIRATION_TIME=315360000
export AIRFLOW__API_AUTH__JWT_EXPIRATION_TIME=315360000
export AIRFLOW__EXECUTION_API__JWT_EXPIRATION_TIME=315360000
export AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK=True
```
We also has an issue with JWT token expiration errors which we for now
circumvented with the longer expiration time setting (for this perhaps we can
raise a separate issue later)
Installation script on ec2 instance (arm, Ubuntu):
```
. /etc/profile.d/nix.sh
&& . /etc/profile.d/Z50-devbox.sh
&& devbox global add uv overmind direnv s5cmd pixi
&& mkdir -p /apps/airflow/apps
&& mkdir -p /apps/airflow/dags
&& uv venv /apps/airflow/.venv --python 3.11
&& uv pip install --python {{ airflow_v2_home }}/.venv/bin/python
--constraint
https://raw.githubusercontent.com/apache/airflow/constraints-3.0.1/constraints-3.11.txt
'apache-airflow[amazon,slack,standard]'
asyncpg
psycopg2-binary
```
### Operating System
Ubuntu 22.04.5 LTS
### Versions of Apache Airflow Providers
_No response_
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
_No response_
### Anything else?
_No response_
### Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]