vlieven opened a new issue, #33000: URL: https://github.com/apache/airflow/issues/33000
### Apache Airflow version Other Airflow 2 version (please specify below) ### What happened After a scheduler restart, a running sensor task was marked as failed. As far as I can tell, the following sequence of events happened: - a sensor task was running - the scheduler was restarted - after restart, the task was correctly adopted by the new scheduler - the actual task state (success) did not match the expected task state (queued), causing the task to be marked as failed. I believe this scenario is not adequately captured by the logic described here: https://github.com/apache/airflow/blob/42465c5a9465fd77f3000117721e0ed1cc51c166/airflow/jobs/scheduler_job_runner.py#L748 This happened on Airflow 2.5.3 ### What you think should happen instead Relevant scheduler log: ``` {scheduler_job.py:687} ERROR - Executor reports task instance <TaskInstance: dag-name.task-name scheduled__2023-07-30T00:00:00+00:00 [queued]> finished (success) although the task says its queued. (Info: None) Was the task killed externally?""}" ``` This causes the following task log: ``` {taskinstance.py:2596} INFO - 0 downstream tasks scheduled from follow-on schedule check {taskinstance.py:1080} INFO - Dependencies not met for <TaskInstance: dag-name.task-name scheduled__2023-07-30T00:00:00+00:00 [failed]>, dependency 'Task Instance State' FAILED: Task is in the 'failed' state. {local_task_job.py:151} INFO - Task is not able to be run ``` Given that the actual sensor state was `success`, it would be nicer to not mark it as `failed`, but rather `up_for_retry`. ### How to reproduce I suppose you might be able to reproduce this if you get the timing exactly correct. ### Operating System Debian 10 ### Versions of Apache Airflow Providers _No response_ ### Deployment Other Docker-based deployment ### Deployment details We're running this on kubernetes ### Anything else _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
