andreahlert opened a new pull request, #61627:
URL: https://github.com/apache/airflow/pull/61627

   Fixes: #58936
   
   ## Summary
   
   When a Kubernetes worker pod receives SIGTERM (e.g. spot interruption, 
scaling down, rolling update), the signal is delivered to the supervisor 
process (PID 1 in the container). Previously, the supervisor had no signal 
handler installed and would exit with default behavior, leaving the task 
subprocess orphaned without ever calling the operator's `on_kill()` hook. This 
meant spawned resources (pods, subprocesses, etc.) were never cleaned up.
   
   **Root cause**: The `supervise()` function starts the task subprocess and 
calls `process.wait()`, but never installs signal handlers for SIGTERM/SIGINT. 
The task subprocess *does* have a SIGTERM handler (registered in 
`task_runner.py`) that calls `on_kill()`, but the signal never reaches it 
because the supervisor process terminates first.
   
   **Fix**: Install SIGTERM/SIGINT signal handlers in `supervise()` that 
forward the received signal to the task subprocess via `os.kill()`. The child's 
existing handler then calls `on_kill()` as expected, restoring the Airflow 2 
behavior.
   
   **Signal flow after fix**:
   1. K8s sends SIGTERM to supervisor (PID 1)
   2. Supervisor's new handler forwards SIGTERM to task subprocess
   3. Task subprocess's existing `_on_term` handler calls `operator.on_kill()`
   4. Operator cleans up resources (pods, subprocesses, etc.)
   5. Subprocess exits, supervisor's `wait()` returns normally
   
   ## Changes
   
   - **`task-sdk/src/airflow/sdk/execution_time/supervisor.py`**: Added signal 
forwarding in `supervise()` function. Signal handlers are saved, installed 
before `process.wait()`, and restored in a `finally` block.
   - **`task-sdk/tests/task_sdk/execution_time/test_supervisor.py`**: Added 
test that verifies SIGTERM forwarding from supervisor to subprocess triggers 
the operator's `on_kill()` hook.
   
   ## Test plan
   
   - [ ] New test `test_on_kill_hook_called_when_supervisor_receives_sigterm` 
verifies the signal forwarding chain
   - [ ] Existing `test_on_kill_hook_called_when_sigkilled` still passes (no 
regression)
   - [ ] Existing signal-related tests (`test_kill_escalation_path`, 
`test_exit_by_signal`) still pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to