You can have a look to this paper [sublinear memory 
usage](https://arxiv.org/pdf/1604.06174.pdf), which include some common 
solutions used by DL frames to lower gpu memory usage.

If you only care about `Forward Inference`, you can try to change the batchsize 
to a small value (at the cost of speed); quantize the network (int8, float16 
etc.), as far as I know, the mkldnn backend of mxnet support this, otherwise, 
tensorrt also have a good support to model quantization. Some other methods you 
can also have a try.

How about the gpu memory cost of same model using TF or PyTorch?





---
[Visit Topic](https://discuss.mxnet.io/t/how-to-limit-gpu-memory-usage/6304/6) 
or reply to this email to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.mxnet.io/email/unsubscribe/7f545ba3ae224bf60f2f3a069883f19ce8cddb94ff713d3d34151207ae00d5d1).

Reply via email to