val-lavrentiev opened a new issue, #51569:
URL: https://github.com/apache/airflow/issues/51569

   ### Apache Airflow version
   
   3.0.1
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   A failed task instead of being restarted stays forever in the "queued" 
state. Scheduler according to the logs tries to schedule it only once (for some 
unknown reason):
   ```
   Jun 10 10:51:26 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:51:26.588+0000] {_client.py:1026} INFO - HTTP Request: PUT 
https://airflow-server.com/execution/task-instances/01975940-1bb5-7690-a48f-7b09540f8373/heartbeat
 "HTTP/1.1 204 No Cont>
   Jun 10 10:51:31 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:51:31.498+0000] {scheduler_job_runner.py:2128} INFO - Adopting 
or resetting orphaned tasks for active dag runs
   Jun 10 10:51:31 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:51:31.614+0000] {_client.py:1026} INFO - HTTP Request: PUT  
https://airflow-server.com/execution/task-instances/01975940-1bb5-7690-a48f-7b09540f8373/heartbeat
 "HTTP/1.1 204 No Cont>
   Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:51:36.795+0000] {_client.py:1026} INFO - HTTP Request: PUT  
https://airflow-server.com/execution/task-instances/01975940-1bb5-7690-a48f-7b09540f8373/heartbeat
 "HTTP/1.1 204 No Cont>
   Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler     | 2025-06-10 
10:51:36 [debug    ] Received message from task runner [supervisor] 
msg=RetryTask(state='up_for_retry', end_date=datetime.datetime(2025, 6, 10, 10, 
51, 36, 773750, tzinfo=TzInfo(UTC)), rendered_m>
   Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:51:36.839+0000] {_client.py:1026} INFO - HTTP Request: PATCH  
https://airflow-server.com/execution/task-instances/01975940-1bb5-7690-a48f-7b09540f8373/state
 "HTTP/1.1 204 No Conten>
   Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler     | 2025-06-10 
10:51:36 [debug    ] Received message from task runner [supervisor] 
msg=SetRenderedFields(rendered_fields={'op_args': [], 'op_kwargs': {}, 
'bash_command': 'direnv allow /data/ephemeral/airflow-ap>
   Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:51:36.857+0000] {_client.py:1026} INFO - HTTP Request: PUT  
https://airflow-server.com/execution/task-instances/01975940-1bb5-7690-a48f-7b09540f8373/rtif
 "HTTP/1.1 404 Not Found"
   Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler     | 2025-06-10 
10:51:36 [warning  ] Server error                   [airflow.sdk.api.client] 
detail=None
   Jun 10 10:51:36 ip-10-0-0-77 start.sh[1369546]: scheduler     | 2025-06-10 
10:51:36 [error    ] API server error               [supervisor] 
detail={'detail': 'Not Found'} message='Not Found' status_code=404
   Jun 10 10:51:51 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:51:51.976+0000] {dag.py:2509} INFO - Setting next_dagrun for 
content_classifier_evaluation to 2025-06-09 00:00:00+00:00, 
run_after=2025-06-11 00:00:00+00:00
   Jun 10 10:52:22 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:52:22.126+0000] {dag.py:2509} INFO - Setting next_dagrun for 
content_classifier_evaluation to 2025-06-09 00:00:00+00:00, 
run_after=2025-06-11 00:00:00+00:00
   Jun 10 10:52:52 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:52:52.272+0000] {dag.py:2509} INFO - Setting next_dagrun for 
content_classifier_evaluation to 2025-06-09 00:00:00+00:00, 
run_after=2025-06-11 00:00:00+00:00
   Jun 10 10:53:23 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:53:23.490+0000] {dag.py:2509} INFO - Setting next_dagrun for 
content_classifier_evaluation to 2025-06-09 00:00:00+00:00, 
run_after=2025-06-11 00:00:00+00:00
   Jun 10 10:53:53 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:53:53.628+0000] {dag.py:2509} INFO - Setting next_dagrun for 
content_classifier_evaluation to 2025-06-09 00:00:00+00:00, 
run_after=2025-06-11 00:00:00+00:00
   Jun 10 10:54:24 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:54:24.826+0000] {dag.py:2509} INFO - Setting next_dagrun for 
content_classifier_evaluation to 2025-06-09 00:00:00+00:00, 
run_after=2025-06-11 00:00:00+00:00
   Jun 10 10:54:54 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:54:54.965+0000] {dag.py:2509} INFO - Setting next_dagrun for 
content_classifier_evaluation to 2025-06-09 00:00:00+00:00, 
run_after=2025-06-11 00:00:00+00:00
   Jun 10 10:55:26 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:55:26.180+0000] {dag.py:2509} INFO - Setting next_dagrun for 
content_classifier_evaluation to 2025-06-09 00:00:00+00:00, 
run_after=2025-06-11 00:00:00+00:00
   Jun 10 10:55:57 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:55:57.145+0000] {dag.py:2509} INFO - Setting next_dagrun for 
content_classifier_evaluation to 2025-06-09 00:00:00+00:00, 
run_after=2025-06-11 00:00:00+00:00
   Jun 10 10:56:27 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:56:27.533+0000] {dag.py:2509} INFO - Setting next_dagrun for 
content_classifier_evaluation to 2025-06-09 00:00:00+00:00, 
run_after=2025-06-11 00:00:00+00:00
   Jun 10 10:56:31 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:56:31.570+0000] {scheduler_job_runner.py:2128} INFO - Adopting 
or resetting orphaned tasks for active dag runs
   Jun 10 10:56:49 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:56:49.831+0000] {scheduler_job_runner.py:450} INFO - 1 tasks up 
for execution:
   Jun 10 10:56:49 ip-10-0-0-77 start.sh[1369546]: scheduler     |         
<TaskInstance: 
cdp_profiles_send_to_dest.run_export_json_and_send_to_destination 
scheduled__2025-06-09T07:00:00+00:00 [scheduled]>
   Jun 10 10:56:49 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:56:49.831+0000] {scheduler_job_runner.py:522} INFO - DAG 
cdp_profiles_send_to_dest has 0/16 running and queued tasks
   Jun 10 10:56:49 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:56:49.832+0000] {scheduler_job_runner.py:661} INFO - Setting the 
following tasks to queued state:
   Jun 10 10:56:49 ip-10-0-0-77 start.sh[1369546]: scheduler     |         
<TaskInstance: 
cdp_profiles_send_to_dest.run_export_json_and_send_to_destination 
scheduled__2025-06-09T07:00:00+00:00 [scheduled]>
   Jun 10 10:56:49 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:56:49.834+0000] {scheduler_job_runner.py:767} INFO - Trying to 
enqueue tasks: [<TaskInstance: 
cdp_profiles_send_to_dest.run_export_json_and_send_to_destination 
scheduled__2025-06-09T07:00:00+0>
   Jun 10 10:56:58 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:56:58.093+0000] {dag.py:2509} INFO - Setting next_dagrun for 
content_classifier_evaluation to 2025-06-09 00:00:00+00:00, 
run_after=2025-06-11 00:00:00+00:00
   Jun 10 10:57:29 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:57:29.518+0000] {dag.py:2509} INFO - Setting next_dagrun for 
content_classifier_evaluation to 2025-06-09 00:00:00+00:00, 
run_after=2025-06-11 00:00:00+00:00
   Jun 10 10:57:59 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:57:59.976+0000] {dag.py:2509} INFO - Setting next_dagrun for 
content_classifier_evaluation to 2025-06-09 00:00:00+00:00, 
run_after=2025-06-11 00:00:00+00:00
   Jun 10 10:58:31 ip-10-0-0-77 start.sh[1369546]: scheduler     | 
[2025-06-10T10:58:31.508+0000] {dag.py:2509} INFO - Setting next_dagrun for 
content_classifier_evaluation to 2025-06-09 00:00:00+00:00, 
run_after=2025-06-11 00:00:00+00:00
   ```
   
   ### What you think should happen instead?
   
   After a fail a task is restarted.
   
   ### How to reproduce
   
   We use default parameter apart from:
   ```
   
   export AIRFLOW_HOME={{ airflow_v2_home }}
   export AIRFLOW__API__BASE_URL=https://airflow-server.com
   export AIRFLOW__CORE__EXECUTOR=LocalExecutor
   export AIRFLOW__CORE__LOAD_EXAMPLES=false
   export AIRFLOW__CORE__PARALLELISM=32
   export AIRFLOW__CORE__SIMPLE_AUTH_MANAGER_ALL_ADMINS=True
   export AIRFLOW__CORE__DAGS_FOLDER=/apps/airflow/dags
   export 
AIRFLOW__SECRETS__BACKEND=airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend
   export AIRFLOW__SECRETS__BACKEND_KWARGS='{"region_name": "eu-west-1", 
"connections_prefix": "", "variables_prefix": "", "config_prefix": ""}'
   export AIRFLOW__SCHEDULER__CREATE_CRON_DATA_INTERVALS=True
   export AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql://{{ 
airflow_v2_postgres_user }}:{{ airflow_v2_postgres_password }}@{{ 
airflow_v2_postgres_host }}/{{ airflow_v2_postgres_database }}
   # workaround for airflow v3.0.1 to avoid scheduler jwt expiration errors
   export AIRFLOW__API_AUTH__JWT_CLI_EXPIRATION_TIME=315360000
   export AIRFLOW__API_AUTH__JWT_EXPIRATION_TIME=315360000
   export AIRFLOW__EXECUTION_API__JWT_EXPIRATION_TIME=315360000
   export AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK=True
   ```
   
   We also has an issue with JWT token expiration errors which we for now 
circumvented with the longer expiration time setting (for this perhaps we can 
raise a separate issue later)
   
   Installation script on ec2 instance (arm, Ubuntu):
   ```
       . /etc/profile.d/nix.sh
       && . /etc/profile.d/Z50-devbox.sh
       && devbox global add uv overmind direnv s5cmd pixi
       && mkdir -p /apps/airflow/apps
       && mkdir -p /apps/airflow/dags
       && uv venv /apps/airflow/.venv --python 3.11
       && uv pip install --python {{ airflow_v2_home }}/.venv/bin/python
       --constraint 
https://raw.githubusercontent.com/apache/airflow/constraints-3.0.1/constraints-3.11.txt
       'apache-airflow[amazon,slack,standard]'
       asyncpg
       psycopg2-binary
   ```
   
   ### Operating System
   
   Ubuntu 22.04.5 LTS
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to