The GitHub Actions job "Tests (AMD)" on 
airflow.git/fix/tasksdk-terminal-state-after-api-success has succeeded.
Run started by GitHub user potiuk (triggered by potiuk).

Head commit for run:
6e75113d72b070e77e97fa1f46ee7f4871a2f2ad / Jarek Potiuk <[email protected]>
Recover stuck TIs when direct terminal-state API call fails

The supervisor's _handle_request for SucceedTask, RetryTask, DeferTask,
and RescheduleTask set _terminal_state BEFORE calling the matching
client.task_instances.{succeed,retry,defer,reschedule}() API. If that
API call raised (transient network blip, server 5xx, etc.),
_terminal_state was set on the supervisor but the server never saw
the transition. The supervisor's update_task_state_if_needed then
saw final_state in STATES_SENT_DIRECTLY and short-circuited the
recovery finish() call -- leaving the TaskInstance stuck RUNNING
on the server forever, blocking downstream dependencies and
triggering false alerts.

Two-part fix:

1. Make the direct API call FIRST. Only set _terminal_state and the
   new _terminal_state_synced_to_server flag after the call returns
   successfully. If the API raises, both stay unset and the exception
   propagates to handle_requests, where the existing catch-all sends
   an ErrorResponse to the task subprocess.

2. Have update_task_state_if_needed always call finish() when
   _terminal_state_synced_to_server is False, regardless of what
   final_state happens to return. The finish() API takes the state
   value, so a SUCCESS / DEFERRED / etc. transition that originally
   failed is re-attempted via finish() on subprocess exit.
   Pre-existing semantics for the no-direct-API states (FAILED,
   UP_FOR_RETRY without RetryTask, etc.) preserved -- those land in
   the same finish() branch.

Tests added:

- _terminal_state not set when succeed() raises.
- update_task_state_if_needed calls finish() when synced flag is
  False, even with final_state == SUCCESS.
- update_task_state_if_needed skips finish() when synced flag is
  True (preserves the existing happy-path optimisation).

Reported by the L3 ASVS sweep at apache/tooling-agents#24 (FINDING-007).

Report URL: https://github.com/apache/airflow/actions/runs/25530182239

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to