Great proposal! Few questions from my end:
1. Will the new C-API functions be threadsafe in general? Speak, I can invoke them at any point in time from any thread without the need of a lock, sticky-thread or a thread hierarchy? (I'm thinking of the thread-safety being done on the backend level) 2. Will this also support the GPU use-case? Speak, the parameters are only copied into GPU memory once in the same fashion as you're describing for the CPU? 3. Do you think there's a path forward to make all inference-related C-APIs threadsafe instead of splitting off another execution branch? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/incubator-mxnet/issues/16431#issuecomment-540828556