Thanks @marcoabreu ! 

> Will the new C-API functions be threadsafe in general? Speak, I can invoke 
> them at any point in time from any thread without the need of a lock, 
> sticky-thread or a thread hierarchy? (I'm thinking of the thread-safety being 
> done on the backend level)

The issue I found with C API thread safety especially with the cached op use 
case was the ThreadLocalStore. If we fix this issue then C APIs related to 
CreateCachedOp and InvokeCachedOp should be threadsafe.

>  Will this also support the GPU use-case? Speak, the parameters are only 
> copied into GPU memory once in the same fashion as you're describing for the 
> CPU?

This should still support the single GPU use-case for 1.6. Multi GPU inference 
use case requires more verification at the cached op level .

> Do you think there's a path forward to make all inference-related C-APIs 
> threadsafe instead of splitting off another execution branch?

I don't think we have such a strict split between inference and training APIs 
at the C API level. For example for gluon cached op we call InvokeCachedOp for 
both training and Inference.

But if I rephrase your question to:
Will I be able to do multi threaded inference from every frontend API which I 
can use to do inference today ?  
Right now, I am targeting only gluon since most users have been directed 
towards gluon. The other ways are using module, symbolic and using C Predict 
API. To support these two frontend APIs requires the graph executor to be 
thread safe.  This would definitely be a great add for MXNet since it would 
ensure that they can do multi-threaded inference from any of these APIs in 
MXNet, but not something I have planned for currently.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16431#issuecomment-540834971

Reply via email to