sidshas03 commented on PR #63355: URL: https://github.com/apache/airflow/pull/63355#issuecomment-4671977499
Yep, that's basically the exact bug I ran into. Step 5 is the key part. Client times out, tenacity retries, and the second PATCH sees the TI is already success so the old code returns a 409. succeed() doesn't handle that 409, so a task that actually finished fine gets flipped to failed. This PR just makes that idempotent. If the TI is already in the state you're setting (success to success), it returns 200 instead of 409. So the retry gets a normal response back and nothing changes, task stays success. Also good point about the sync def running in the threadpool. That's why the server keeps going and commits even after the client disconnects, which is what causes the duplicate PATCH to begin with. Either way the fix doesn't really depend on why the retry happened, so it should cover #65708 and #63183 both. mypy is passing now after I pulled in latest main (#68300). One MySQL job still shows cancelled but that looks like a CI infra thing rather than an actual failure, should be fine on a rerun. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
