kohillyang edited a comment on issue #19159:
URL: 
https://github.com/apache/incubator-mxnet/issues/19159#issuecomment-693581070


   @wkcn but even if flask has created a new process, the GPU memory should be 
freed once the process ends. And the predictor is created in the main function, 
which should only be called once and has only one predictor instance. On the 
other side, if the main process has initialized a CUDA environment, the mxnet 
in the subprocess will fail when inference because their CUDA file descriptor 
can not be shared between the main process and the sub-process.
   
   BTW. , the pid of the process and the id of the predictor remain unchanged. 
I print them using the following codes:
   ```python
           print(id(self))
           print(os.getpid())
   ```
   
   PS: `ctx.empty_cache()` is also not thread-safe. If you called it in two 
threads, the program would crash in some cases.   
   
   Thread-safe is of importance because in some time you need to implement a 
Block with asnumpy, and it is too hard to implement all blocks as HybridBlock 
and as an asynchronous way. In pytorch it is not a problem because we have 
DataParallel. It will start a thread for each CPU instance and gather the 
results, but this operation is not officially supported by mxnet because at 
least there are something like 
<https://github.com/apache/incubator-mxnet/issues/13199> which need 
workarounds. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to