ann-qin-lu commented on issue #20959: URL: https://github.com/apache/incubator-mxnet/issues/20959#issuecomment-1072926185
After more deep dives, this issue is actually not caused by cuda upgrade from 10 to 11, but introduced by this specific [commit: Remove cleanup on side threads](https://github.com/apache/incubator-mxnet/pull/19378), which skips the cuda deinitialization when destructing engine. I've confirmed that after reverting this commit, the memory leak issue is gone. I'll work with MXNet team to see if this commit should be reverted in both MxNet master and 1.9 branch. (actually another user reported similar memory [issue](https://github.com/apache/incubator-mxnet/issues/19420) when using the multiprocessing and tried to [revert](https://github.com/apache/incubator-mxnet/pull/19432) this commit). Here is the open [issue](https://github.com/apache/incubator-mxnet/issues/19379) for better handling the engine destruction, which needs to be addressed first if the above workaround will be reverted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
