wangzhigang1999 opened a new issue, #7149:
URL: https://github.com/apache/kyuubi/issues/7149

   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [x] I have searched in the 
[issues](https://github.com/apache/kyuubi/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Describe the feature
   
   This feature introducing a mechanism that, when the Spark engine decides to 
shut down, starts a shutdown watchdog. If the timeout is reached, it will print 
the stack traces of all currently alive threads and then forcibly terminate the 
process.
   
   ### Motivation
   
   Currently, there are scenarios where the engine should exit but fails to do 
so due to various reasons, and these scenarios cannot be exhaustively 
enumerated. For example, see this discussion: 
https://github.com/apache/kyuubi/discussions/6992#discussioncomment-13775648, 
and these issues: https://github.com/apache/kyuubi/issues/4280, 
https://github.com/apache/kyuubi/issues/7019.
   
   Similarly, we encountered this issue in production. For example, in the 
following log, after SparkContext stopped, the entire process should have 
executed the shutdown hook and exited. However, due to an abnormal Ranger 
thread, the process was blocked for over ten days until it eventually exhausted 
the ECS resources and was finally discovered.
   
   <img width="2844" height="1112" alt="Image" 
src="https://github.com/user-attachments/assets/f142f869-850c-466c-9eb2-886fcba7416e";
 />
   
   ### Describe the solution
   
   I want to add a daemon watchdog thread that starts with a timeout when the 
stop() method is called. If the process can shut down normally, this daemon 
thread will be interrupted and the entire process will exit gracefully. If the 
timeout is reached and the process is still alive, it means some threads are 
blocking the shutdown; I will then print all active threads in the current 
process and force quit.
   
   ### Additional context
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes. I would be willing to submit a PR with guidance from the Kyuubi 
community to improve.
   - [ ] No. I cannot submit a PR at this time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to