[GitHub] [incubator-mxnet] arcadiaphy edited a comment on issue #16431: [RFC] MXNet Multithreaded Inference Interface

GitBox Thu, 05 Dec 2019 01:49:45 -0800

arcadiaphy edited a comment on issue #16431: [RFC] MXNet Multithreaded 
Inference Interface
URL: 
https://github.com/apache/incubator-mxnet/issues/16431#issuecomment-562052116
 
 
   @anirudh2290 Just see this RFC. Let me share what I've done in multithreaded 
infererce, I think it's the only viable way now in mxnet.
   
   I've deployed many models with scala API, and run them in multiple threads. 
The whole system has run smoothly in production environment for more than 2 
months.
   
   The backend of inference is graph executor, which is created for each thread 
with shared model parameters. The executors can be dynamically reshaped in each 
thread independently according to the shape of the data input.
   
   Like what's mentioned above, the dependency engine is not thread safe, so if 
you run it in threaded engine, dead lock and core dump will happen. Therefore, 
naive engine is the only option left. Without the dependency scheduling, any 
write dependency on model parameters is likely to be executed simultaneously 
and mess the internal data. If mkldnn is used to accelerate inference, you will 
get non-deterministic results per inference because mxnet stealthily reorder 
the data in ndarray (write dependency involved) for mkldnn operators. I've used 
a temporary method to address this issue which is not suitable for an official 
PR.
   
   Multithreaded inference should be used with care. Sharing model parameters 
can reduce the memory footprint in your program, but a lot of memory usage is 
consumed by global resources (temporary workspace, random number generator, 
...) or op cache for mkldnn which are stored in static thread_local variables. 
So **thread number** is the most important factor for memory footprint, any 
thread involving mxnet operation, be it any trivial imperative invoking of 
operators, will incur memory overhead by creating its own set of thread_local 
variables. I've spent so much time tracking down memory leak and the best 
solution is to limit thread number.
   
   A new method to do multithreaded inference by threaded engine is much 
welcomed here. It will solve the above issues automatically.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] arcadiaphy edited a comment on issue #16431: [RFC] MXNet Multithreaded Inference Interface

Reply via email to