wangzhigang1999 opened a new issue, #7149: URL: https://github.com/apache/kyuubi/issues/7149
### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [x] I have searched in the [issues](https://github.com/apache/kyuubi/issues?q=is%3Aissue) and found no similar issues. ### Describe the feature This feature introducing a mechanism that, when the Spark engine decides to shut down, starts a shutdown watchdog. If the timeout is reached, it will print the stack traces of all currently alive threads and then forcibly terminate the process. ### Motivation Currently, there are scenarios where the engine should exit but fails to do so due to various reasons, and these scenarios cannot be exhaustively enumerated. For example, see this discussion: https://github.com/apache/kyuubi/discussions/6992#discussioncomment-13775648, and these issues: https://github.com/apache/kyuubi/issues/4280, https://github.com/apache/kyuubi/issues/7019. Similarly, we encountered this issue in production. For example, in the following log, after SparkContext stopped, the entire process should have executed the shutdown hook and exited. However, due to an abnormal Ranger thread, the process was blocked for over ten days until it eventually exhausted the ECS resources and was finally discovered. <img width="2844" height="1112" alt="Image" src="https://github.com/user-attachments/assets/f142f869-850c-466c-9eb2-886fcba7416e" /> ### Describe the solution I want to add a daemon watchdog thread that starts with a timeout when the stop() method is called. If the process can shut down normally, this daemon thread will be interrupted and the entire process will exit gracefully. If the timeout is reached and the process is still alive, it means some threads are blocking the shutdown; I will then print all active threads in the current process and force quit. ### Additional context _No response_ ### Are you willing to submit PR? - [x] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to improve. - [ ] No. I cannot submit a PR at this time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
