Thanks @marcoabreu ! > Will the new C-API functions be threadsafe in general? Speak, I can invoke > them at any point in time from any thread without the need of a lock, > sticky-thread or a thread hierarchy? (I'm thinking of the thread-safety being > done on the backend level)
The issue I found with C API thread safety especially with the cached op use case was the ThreadLocalStore. If we fix this issue then C APIs related to CreateCachedOp and InvokeCachedOp should be threadsafe. > Will this also support the GPU use-case? Speak, the parameters are only > copied into GPU memory once in the same fashion as you're describing for the > CPU? This should still support the single GPU use-case for 1.6. Multi GPU inference use case requires more verification at the cached op level . > Do you think there's a path forward to make all inference-related C-APIs > threadsafe instead of splitting off another execution branch? I don't think we have such a strict split between inference and training APIs at the C API level. For example for gluon cached op we call InvokeCachedOp for both training and Inference. But if I rephrase your question to: Will I be able to do multi threaded inference from every frontend API which I can use to do inference today ? Right now, I am targeting only gluon since most users have been directed towards gluon. The other ways are using module, symbolic and using C Predict API. To support these two frontend APIs requires the graph executor to be thread safe. This would definitely be a great add for MXNet since it would ensure that they can do multi-threaded inference from any of these APIs in MXNet, but not something I have planned for currently. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/incubator-mxnet/issues/16431#issuecomment-540834971