jason810496 commented on code in PR #66573:
URL: https://github.com/apache/airflow/pull/66573#discussion_r3257989121
##########
task-sdk/src/airflow/sdk/execution_time/task_runner.py:
##########
@@ -1426,6 +1426,25 @@ def _on_term(signum, frame):
"Failed to report terminal task state to supervisor",
state=state.value,
)
+ # Fail closed for non-success terminal states: when the
+ # supervisor never receives the terminal-state message,
+ # exiting 0 would let the supervisor's final_state property
+ # default to SUCCESS (exit_code == 0 with no _terminal_state
+ # set). For a task that actually FAILED / was SKIPPED / etc.,
+ # that turns an IPC blip into silent data-quality breakage
+ # for every downstream task. Exit non-zero so the
+ # supervisor's final_state correctly classifies this as
+ # FAILED (or UP_FOR_RETRY when retries are configured).
+ #
+ # SUCCESS is exempt: a "send the SUCCESS marker, supervisor
+ # rejects with 409 because the server already terminalised
+ # this TI" path is the legitimate scenario the existing
+ # softening targets. In that path the local state is SUCCESS
+ # and the supervisor's default-to-SUCCESS coincides with
+ # reality, so we continue to finalize() so listeners observe
+ # the task state.
+ if state != TaskInstanceState.SUCCESS:
+ sys.exit(1)
Review Comment:
Since we will still enter the `except Exception` in `main` without the new
`exit`. Additionally, the `finally` in `main` will do the teardown for
`SUPERVISOR_COMMS.socket`.
https://github.com/apache/airflow/blob/ac39596bd531f8df6092531b3bde7acb54fff16f/task-sdk/src/airflow/sdk/execution_time/task_runner.py#L2033-L2041
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]