The GitHub Actions job "Tests (AMD)" on airflow.git/fix/tasksdk-terminal-state-after-api-success has succeeded. Run started by GitHub user potiuk (triggered by potiuk).
Head commit for run: 6e75113d72b070e77e97fa1f46ee7f4871a2f2ad / Jarek Potiuk <[email protected]> Recover stuck TIs when direct terminal-state API call fails The supervisor's _handle_request for SucceedTask, RetryTask, DeferTask, and RescheduleTask set _terminal_state BEFORE calling the matching client.task_instances.{succeed,retry,defer,reschedule}() API. If that API call raised (transient network blip, server 5xx, etc.), _terminal_state was set on the supervisor but the server never saw the transition. The supervisor's update_task_state_if_needed then saw final_state in STATES_SENT_DIRECTLY and short-circuited the recovery finish() call -- leaving the TaskInstance stuck RUNNING on the server forever, blocking downstream dependencies and triggering false alerts. Two-part fix: 1. Make the direct API call FIRST. Only set _terminal_state and the new _terminal_state_synced_to_server flag after the call returns successfully. If the API raises, both stay unset and the exception propagates to handle_requests, where the existing catch-all sends an ErrorResponse to the task subprocess. 2. Have update_task_state_if_needed always call finish() when _terminal_state_synced_to_server is False, regardless of what final_state happens to return. The finish() API takes the state value, so a SUCCESS / DEFERRED / etc. transition that originally failed is re-attempted via finish() on subprocess exit. Pre-existing semantics for the no-direct-API states (FAILED, UP_FOR_RETRY without RetryTask, etc.) preserved -- those land in the same finish() branch. Tests added: - _terminal_state not set when succeed() raises. - update_task_state_if_needed calls finish() when synced flag is False, even with final_state == SUCCESS. - update_task_state_if_needed skips finish() when synced flag is True (preserves the existing happy-path optimisation). Reported by the L3 ASVS sweep at apache/tooling-agents#24 (FINDING-007). Report URL: https://github.com/apache/airflow/actions/runs/25530182239 With regards, GitHub Actions via GitBox --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
