paramjeet01 opened a new issue, #39791: URL: https://github.com/apache/airflow/issues/39791
### Apache Airflow version main (development) ### If "Other Airflow 2 version" selected, which one? 2.8.3 ### What happened? **Issue :** When a worker pod is killed , it is expected not to kill the task pods when `reattach_on_restart` is configured as True. **Case:** Our current configuration of airflow includes usage of aws EC2 spot instances so we can say that workers are expected to be killed occasionally when the EC2 instance has be interrupted. When the EC2 instance has been removed from kubernetes nodes , it sends the SIGTERM to the worker pod which invokes the below `_execute_task_with_callbacks` method and `on-kill` method is called which will kill the task pod. So , `reattach_on_restart` won't work as expected since the task pod is deleted when a worker pod is killed. https://github.com/apache/airflow/blob/2d53c1089f78d8d1416f51af60e1e0354781c661/airflow/models/taskinstance.py#L2592-L2613 ### What you think should happen instead? The task pod should remain active when a worker pod is terminated if reattach_on_restart is set to True. ### How to reproduce - Create a task that uses kuberenetes pod operator. - Set `reattach_on_restart` to True in the task. - Delete the worker pod which will result in deletion of worker pod and the task pod. ### Operating System Amazon Linux 2 ### Versions of Apache Airflow Providers apache-airflow-providers-cncf-kubernetes==8.2.0 ### Deployment Official Apache Airflow Helm Chart ### Deployment details _No response_ ### Anything else? **Suggestion :** The `on_kill` method should send another parameter which states who called this method (Is it called by airflow UI clear task button or due to SIGTERM signal ) so based on it we can keep the pod or delete the pod. https://github.com/apache/airflow/blob/fe4605a10e26f1b8a180979ba5765d1cb7fb0111/airflow/providers/cncf/kubernetes/operators/pod.py#L989-L1003 If someone helps me on the taskinstance.py code to add the above recommendation , I'll be able to refactor the on_kill method. ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
