karenbraganz commented on issue #51969: URL: https://github.com/apache/airflow/issues/51969#issuecomment-3293721335
I have also observed a similar issue. This eventually resulted in a successful task getting marked as failed. Below, I will explain what I believe happened: 1. A single try of a task instance is detected as a zombie twice due to the zombie_detection_interval race condition described by Alan. This produces two callback requests. 2. The first callback request is processed and executed a few seconds later resulting in the task getting marked in the `up_for_retry` state. 3. The task is retried and succeeds. 4. Shortly after the success, the second callback request (from the second zombie detection of the first try) is processed and executed. This results in the task instance getting marked as failed. Even though this callback request corresponds to the first try, the most recent try gets marked as failed. Therefore, the second try, which was initially in the success state, got marked in the failed state. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
