vlieven opened a new issue, #33000:
URL: https://github.com/apache/airflow/issues/33000

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   After a scheduler restart, a running sensor task was marked as failed.
   
   As far as I can tell, the following sequence of events happened:
   - a sensor task was running
   - the scheduler was restarted
   - after restart, the task was correctly adopted by the new scheduler
   - the actual task state (success) did not match the expected task state 
(queued), causing the task to be marked as failed.
   
   I believe this scenario is not adequately captured by the logic described 
here:
   
https://github.com/apache/airflow/blob/42465c5a9465fd77f3000117721e0ed1cc51c166/airflow/jobs/scheduler_job_runner.py#L748
   
   This happened on Airflow 2.5.3
   
   ### What you think should happen instead
   
   Relevant scheduler log:
   ```
   {scheduler_job.py:687} ERROR - Executor reports task instance <TaskInstance: 
dag-name.task-name scheduled__2023-07-30T00:00:00+00:00 [queued]> finished 
(success) although the task says its queued. (Info: None) Was the task killed 
externally?""}"
   ```
   
   This causes the following task log:
   ```
   {taskinstance.py:2596} INFO - 0 downstream tasks scheduled from follow-on 
schedule check
   {taskinstance.py:1080} INFO - Dependencies not met for <TaskInstance: 
dag-name.task-name  scheduled__2023-07-30T00:00:00+00:00 [failed]>, dependency 
'Task Instance State' FAILED: Task is in the 'failed' state.
   {local_task_job.py:151} INFO - Task is not able to be run
    ```
    
   Given that the actual sensor state was `success`, it would be nicer to not 
mark it as `failed`, but rather `up_for_retry`.
   
   ### How to reproduce
   
   I suppose you might be able to reproduce this if you get the timing exactly 
correct.
   
   ### Operating System
   
   Debian 10
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   We're running this on kubernetes
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to