@anirudh2290 Just see this RFC. Let me share what I've done in multithreaded 
infererce, I think it's the only viable way now in mxnet.

I've deployed many models with scala API, and run them in multiple threads. The 
whole system has run smoothly in production environment for more than 2 months.

The backend of inference is graph executor, which is created for each thread 
with shared model parameters. The executors can be dynamically reshaped in each 
thread independently according to the shape of the data input.

Like what's mentioned above, the dependency engine is not thread safe, so if 
you run it in threaded engine, dead lock and core dump will happen. Therefore, 
naive engine is the only option left. Without the dependency scheduling, any 
write dependency on model parameters is likely to be executed simultaneously 
and mess the internal data. If mkldnn is used to accelerate inference, you will 
get non-deterministic results per inference because mxnet stealthily reorder 
the data in ndarray (write dependency involved) for mkldnn operators. I've used 
a temporary method to address this issue which is not suitable for an official 
PR.

Multithreaded inference should be used with care. Sharing model parameters can 
reduce the memory footprint in your program, but a lot of memory usage is 
consumed by global resources (temporary workspace, random number generator, 
...) or op cache for mkldnn which are stored in static thread_local variables. 
So **thread number** is the most important factor for memory footprint, any 
thread involving mxnet operation, be it any trivial imperative invoking of 
operators, will incur memory overhead by creating its own set of thread_local 
variables. I've spent so much time tracking memory leak and the best solution 
is to limit thread number.

A new method to do multithreaded inference by threaded engine is much welcomed 
here. It will solve the above problems automatically.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16431#issuecomment-562052116

Reply via email to