Re: [apache/incubator-mxnet] [RFC] MXNet 2.0 API Deprecation (#17676)

2020-03-06 Thread Wang Jiajun
> We may also drop ONNX in MXNet 2. I'm not aware of anyone working on ONNX in 
> MXNet and TVM can be used as a replacement.

+1 for keeping ONNX support. Although it has a lot of small problems, but I've 
converted a lot of pytorch models to mxnet for deploying with the following 
pipeline:
https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-onnx-pytorch-mxnet.html

-- 
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/17676#issuecomment-595835658

Re: [apache/incubator-mxnet] [RFC] MXNet Multithreaded Inference Interface (#16431)

2019-12-05 Thread Wang Jiajun
@anirudh2290 Just see this RFC. Let me share what I've done in multithreaded 
infererce, I think it's the only viable way now in mxnet.

I've deployed many models with scala API, and run them in multiple threads. The 
whole system has run smoothly in production environment for more than 2 months.

The backend of inference is graph executor, which is created for each thread 
with shared model parameters. The executors can be dynamically reshaped in each 
thread independently according to the shape of the data input.

Like what's mentioned above, the dependency engine is not thread safe, so if 
you run it in threaded engine, dead lock and core dump will happen. Therefore, 
naive engine is the only option left. Without the dependency scheduling, any 
write dependency on model parameters is likely to be executed simultaneously 
and mess the internal data. If mkldnn is used to accelerate inference, you will 
get non-deterministic results per inference because mxnet stealthily reorder 
the data in ndarray (write dependency involved) for mkldnn operators. I've used 
a temporary method to address this issue which is not suitable for an official 
PR.

Multithreaded inference should be used with care. Sharing model parameters can 
reduce the memory footprint in your program, but a lot of memory usage is 
consumed by global resources (temporary workspace, random number generator, 
...) or op cache for mkldnn which are stored in static thread_local variables. 
So **thread number** is the most important factor for memory footprint, any 
thread involving mxnet operation, be it any trivial imperative invoking of 
operators, will incur memory overhead by creating its own set of thread_local 
variables. I've spent so much time tracking memory leak and the best solution 
is to limit thread number.

A new method to do multithreaded inference by threaded engine is much welcomed 
here. It will solve the above problems automatically.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16431#issuecomment-562052116

subscribe mxnet dev mailing list

2019-04-16 Thread Wang Jiajun