This is an automated email from the ASF dual-hosted git repository.

tlopex pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
     new 45bef4579c [CUDA] Fix cuModuleUnload crash during interpreter shutdown 
(#18624)
45bef4579c is described below

commit 45bef4579cc6411eff7fc3344b76ee0ce13d32e7
Author: Guan-Ming (Wesley) Chiu <[email protected]>
AuthorDate: Mon Dec 29 19:23:42 2025 +0800

    [CUDA] Fix cuModuleUnload crash during interpreter shutdown (#18624)
    
    ## Related
    
    #18614 ci error
    
    ## Why
    
    The CUDAModuleNode destructor was using CUDA_DRIVER_CALL and CUDA_CALL
    macros that call LOG(FATAL) (throw an exception) when CUDA operations
    fail.
    
    During interpreter shutdown, the CUDA context can become invalid,
    causing CUDA_ERROR_ILLEGAL_ADDRESS when cuModuleUnload is called.
    Throwing exceptions in destructors is undefined behavior and causes
    crashes.
    
    ## How
    1. Removed the throwing macros from the destructor
    2. Check cudaSetDevice return value and skip cleanup if it fails
    3. Ignore errors from cuModuleUnload - during shutdown these are benign
    since the OS will reclaim resources anyway
---
 src/runtime/cuda/cuda_module.cc | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/runtime/cuda/cuda_module.cc b/src/runtime/cuda/cuda_module.cc
index f07996c68b..19f4288c97 100644
--- a/src/runtime/cuda/cuda_module.cc
+++ b/src/runtime/cuda/cuda_module.cc
@@ -60,8 +60,13 @@ class CUDAModuleNode : public ffi::ModuleObj {
   ~CUDAModuleNode() {
     for (size_t i = 0; i < module_.size(); ++i) {
       if (module_[i] != nullptr) {
-        CUDA_CALL(cudaSetDevice(static_cast<int>(i)));
-        CUDA_DRIVER_CALL(cuModuleUnload(module_[i]));
+        cudaError_t set_err = cudaSetDevice(static_cast<int>(i));
+        if (set_err != cudaSuccess && set_err != cudaErrorCudartUnloading) {
+          continue;
+        }
+        CUresult result = cuModuleUnload(module_[i]);
+        // Ignore errors during cleanup - context may be shutting down
+        (void)result;
       }
     }
   }

Reply via email to