val-lavrentiev opened a new issue, #51569: URL: https://github.com/apache/airflow/issues/51569
### Apache Airflow version 3.0.1 ### If "Other Airflow 2 version" selected, which one? _No response_ ### What happened? A failed task instead of being restarted stays forever in the "queued" state. Scheduler according to the logs tries to schedule it only once (for some unknown reason): ``` Jun 10 10:51:26 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:51:26.588+0000] {_client.py:1026} INFO - HTTP Request: PUT https://airflow-server.com/execution/task-instances/01975940-1bb5-7690-a48f-7b09540f8373/heartbeat "HTTP/1.1 204 No Cont> Jun 10 10:51:31 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:51:31.498+0000] {scheduler_job_runner.py:2128} INFO - Adopting or resetting orphaned tasks for active dag runs Jun 10 10:51:31 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:51:31.614+0000] {_client.py:1026} INFO - HTTP Request: PUT https://airflow-server.com/execution/task-instances/01975940-1bb5-7690-a48f-7b09540f8373/heartbeat "HTTP/1.1 204 No Cont> Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:51:36.795+0000] {_client.py:1026} INFO - HTTP Request: PUT https://airflow-server.com/execution/task-instances/01975940-1bb5-7690-a48f-7b09540f8373/heartbeat "HTTP/1.1 204 No Cont> Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler | 2025-06-10 10:51:36 [debug ] Received message from task runner [supervisor] msg=RetryTask(state='up_for_retry', end_date=datetime.datetime(2025, 6, 10, 10, 51, 36, 773750, tzinfo=TzInfo(UTC)), rendered_m> Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:51:36.839+0000] {_client.py:1026} INFO - HTTP Request: PATCH https://airflow-server.com/execution/task-instances/01975940-1bb5-7690-a48f-7b09540f8373/state "HTTP/1.1 204 No Conten> Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler | 2025-06-10 10:51:36 [debug ] Received message from task runner [supervisor] msg=SetRenderedFields(rendered_fields={'op_args': [], 'op_kwargs': {}, 'bash_command': 'direnv allow /data/ephemeral/airflow-ap> Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:51:36.857+0000] {_client.py:1026} INFO - HTTP Request: PUT https://airflow-server.com/execution/task-instances/01975940-1bb5-7690-a48f-7b09540f8373/rtif "HTTP/1.1 404 Not Found" Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler | 2025-06-10 10:51:36 [warning ] Server error [airflow.sdk.api.client] detail=None Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler | 2025-06-10 10:51:36 [error ] API server error [supervisor] detail={'detail': 'Not Found'} message='Not Found' status_code=404 Jun 10 10:51:51 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:51:51.976+0000] {dag.py:2509} INFO - Setting next_dagrun for content_classifier_evaluation to 2025-06-09 00:00:00+00:00, run_after=2025-06-11 00:00:00+00:00 Jun 10 10:52:22 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:52:22.126+0000] {dag.py:2509} INFO - Setting next_dagrun for content_classifier_evaluation to 2025-06-09 00:00:00+00:00, run_after=2025-06-11 00:00:00+00:00 Jun 10 10:52:52 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:52:52.272+0000] {dag.py:2509} INFO - Setting next_dagrun for content_classifier_evaluation to 2025-06-09 00:00:00+00:00, run_after=2025-06-11 00:00:00+00:00 Jun 10 10:53:23 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:53:23.490+0000] {dag.py:2509} INFO - Setting next_dagrun for content_classifier_evaluation to 2025-06-09 00:00:00+00:00, run_after=2025-06-11 00:00:00+00:00 Jun 10 10:53:53 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:53:53.628+0000] {dag.py:2509} INFO - Setting next_dagrun for content_classifier_evaluation to 2025-06-09 00:00:00+00:00, run_after=2025-06-11 00:00:00+00:00 Jun 10 10:54:24 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:54:24.826+0000] {dag.py:2509} INFO - Setting next_dagrun for content_classifier_evaluation to 2025-06-09 00:00:00+00:00, run_after=2025-06-11 00:00:00+00:00 Jun 10 10:54:54 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:54:54.965+0000] {dag.py:2509} INFO - Setting next_dagrun for content_classifier_evaluation to 2025-06-09 00:00:00+00:00, run_after=2025-06-11 00:00:00+00:00 Jun 10 10:55:26 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:55:26.180+0000] {dag.py:2509} INFO - Setting next_dagrun for content_classifier_evaluation to 2025-06-09 00:00:00+00:00, run_after=2025-06-11 00:00:00+00:00 Jun 10 10:55:57 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:55:57.145+0000] {dag.py:2509} INFO - Setting next_dagrun for content_classifier_evaluation to 2025-06-09 00:00:00+00:00, run_after=2025-06-11 00:00:00+00:00 Jun 10 10:56:27 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:56:27.533+0000] {dag.py:2509} INFO - Setting next_dagrun for content_classifier_evaluation to 2025-06-09 00:00:00+00:00, run_after=2025-06-11 00:00:00+00:00 Jun 10 10:56:31 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:56:31.570+0000] {scheduler_job_runner.py:2128} INFO - Adopting or resetting orphaned tasks for active dag runs Jun 10 10:56:49 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:56:49.831+0000] {scheduler_job_runner.py:450} INFO - 1 tasks up for execution: Jun 10 10:56:49 ip-10-0-0-77 start.sh[1369546]: scheduler | <TaskInstance: cdp_profiles_send_to_dest.run_export_json_and_send_to_destination scheduled__2025-06-09T07:00:00+00:00 [scheduled]> Jun 10 10:56:49 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:56:49.831+0000] {scheduler_job_runner.py:522} INFO - DAG cdp_profiles_send_to_dest has 0/16 running and queued tasks Jun 10 10:56:49 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:56:49.832+0000] {scheduler_job_runner.py:661} INFO - Setting the following tasks to queued state: Jun 10 10:56:49 ip-10-0-0-77 start.sh[1369546]: scheduler | <TaskInstance: cdp_profiles_send_to_dest.run_export_json_and_send_to_destination scheduled__2025-06-09T07:00:00+00:00 [scheduled]> Jun 10 10:56:49 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:56:49.834+0000] {scheduler_job_runner.py:767} INFO - Trying to enqueue tasks: [<TaskInstance: cdp_profiles_send_to_dest.run_export_json_and_send_to_destination scheduled__2025-06-09T07:00:00+0> Jun 10 10:56:58 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:56:58.093+0000] {dag.py:2509} INFO - Setting next_dagrun for content_classifier_evaluation to 2025-06-09 00:00:00+00:00, run_after=2025-06-11 00:00:00+00:00 Jun 10 10:57:29 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:57:29.518+0000] {dag.py:2509} INFO - Setting next_dagrun for content_classifier_evaluation to 2025-06-09 00:00:00+00:00, run_after=2025-06-11 00:00:00+00:00 Jun 10 10:57:59 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:57:59.976+0000] {dag.py:2509} INFO - Setting next_dagrun for content_classifier_evaluation to 2025-06-09 00:00:00+00:00, run_after=2025-06-11 00:00:00+00:00 Jun 10 10:58:31 ip-10-0-0-77 start.sh[1369546]: scheduler | [2025-06-10T10:58:31.508+0000] {dag.py:2509} INFO - Setting next_dagrun for content_classifier_evaluation to 2025-06-09 00:00:00+00:00, run_after=2025-06-11 00:00:00+00:00 ``` ### What you think should happen instead? After a fail a task is restarted. ### How to reproduce We use default parameter apart from: ``` export AIRFLOW_HOME={{ airflow_v2_home }} export AIRFLOW__API__BASE_URL=https://airflow-server.com export AIRFLOW__CORE__EXECUTOR=LocalExecutor export AIRFLOW__CORE__LOAD_EXAMPLES=false export AIRFLOW__CORE__PARALLELISM=32 export AIRFLOW__CORE__SIMPLE_AUTH_MANAGER_ALL_ADMINS=True export AIRFLOW__CORE__DAGS_FOLDER=/apps/airflow/dags export AIRFLOW__SECRETS__BACKEND=airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend export AIRFLOW__SECRETS__BACKEND_KWARGS='{"region_name": "eu-west-1", "connections_prefix": "", "variables_prefix": "", "config_prefix": ""}' export AIRFLOW__SCHEDULER__CREATE_CRON_DATA_INTERVALS=True export AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql://{{ airflow_v2_postgres_user }}:{{ airflow_v2_postgres_password }}@{{ airflow_v2_postgres_host }}/{{ airflow_v2_postgres_database }} # workaround for airflow v3.0.1 to avoid scheduler jwt expiration errors export AIRFLOW__API_AUTH__JWT_CLI_EXPIRATION_TIME=315360000 export AIRFLOW__API_AUTH__JWT_EXPIRATION_TIME=315360000 export AIRFLOW__EXECUTION_API__JWT_EXPIRATION_TIME=315360000 export AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK=True ``` We also has an issue with JWT token expiration errors which we for now circumvented with the longer expiration time setting (for this perhaps we can raise a separate issue later) Installation script on ec2 instance (arm, Ubuntu): ``` . /etc/profile.d/nix.sh && . /etc/profile.d/Z50-devbox.sh && devbox global add uv overmind direnv s5cmd pixi && mkdir -p /apps/airflow/apps && mkdir -p /apps/airflow/dags && uv venv /apps/airflow/.venv --python 3.11 && uv pip install --python {{ airflow_v2_home }}/.venv/bin/python --constraint https://raw.githubusercontent.com/apache/airflow/constraints-3.0.1/constraints-3.11.txt 'apache-airflow[amazon,slack,standard]' asyncpg psycopg2-binary ``` ### Operating System Ubuntu 22.04.5 LTS ### Versions of Apache Airflow Providers _No response_ ### Deployment Official Apache Airflow Helm Chart ### Deployment details _No response_ ### Anything else? _No response_ ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org