matteosal opened a new issue #20687:
URL: https://github.com/apache/incubator-mxnet/issues/20687


   The following script (+ attached symbol) runs a basic forward + backward 
pass on GPU for a medium-sized convnet:
   ```
   import mxnet as mx
   import time
   
   sym = mx.symbol.load('path/to/sym.json')
   
   shapes = {
        '.Inputs.Input': [10,3,256,256],
        '.Nodes.2.Arrays.Weights': [64,3,4,4],
        '.Nodes.2.Arrays.Biases': [64],
        '.Nodes.4.Arrays.Weights': [128,64,4,4],
        '.Nodes.4.Arrays.Biases': [128],
        '.Nodes.5.Arrays.Scaling': [128],
        '.Nodes.5.Arrays.Biases': [128],
        '.Nodes.7.Arrays.Weights': [256,128,4,4],
        '.Nodes.7.Arrays.Biases': [256],
        '.Nodes.8.Arrays.Scaling': [256],
        '.Nodes.8.Arrays.Biases': [256],
        '.Nodes.10.Arrays.Weights': [512,256,4,4],
        '.Nodes.10.Arrays.Biases': [512],
        '.Nodes.11.Arrays.Scaling': [512],
        '.Nodes.11.Arrays.Biases': [512],
        '.Nodes.13.Arrays.Weights': [512,512,4,4],
        '.Nodes.13.Arrays.Biases': [512],
        '.Nodes.14.Arrays.Scaling': [512],
        '.Nodes.14.Arrays.Biases': [512],
        '.Nodes.16.Arrays.Weights': [512,512,4,4],
        '.Nodes.16.Arrays.Biases': [512],
        '.Nodes.17.Arrays.Scaling': [512],
        '.Nodes.17.Arrays.Biases': [512],
        '.Nodes.19.Arrays.Weights': [512,512,4,4],
        '.Nodes.19.Arrays.Biases': [512],
        '.Nodes.20.Arrays.Scaling': [512],
        '.Nodes.20.Arrays.Biases': [512],
        '.Nodes.22.Arrays.Weights': [512,512,4,4],
        '.Nodes.22.Arrays.Biases': [512],
        '.Nodes.24.Arrays.Weights': [512,512,4,4],
        '.Nodes.24.Arrays.Biases': [512],
        '.Nodes.25.Arrays.Scaling': [512],
        '.Nodes.25.Arrays.Biases': [512],
        '.Nodes.29.Arrays.Weights': [1024,512,4,4],
        '.Nodes.29.Arrays.Biases': [512],
        '.Nodes.30.Arrays.Scaling': [512],
        '.Nodes.30.Arrays.Biases': [512],
        '.Nodes.34.Arrays.Weights': [1024,512,4,4],
        '.Nodes.34.Arrays.Biases': [512],
        '.Nodes.35.Arrays.Scaling': [512],
        '.Nodes.35.Arrays.Biases': [512],
        '.Nodes.39.Arrays.Weights': [1024,512,4,4],
        '.Nodes.39.Arrays.Biases': [512],
        '.Nodes.40.Arrays.Scaling': [512],
        '.Nodes.40.Arrays.Biases': [512],
        '.Nodes.43.Arrays.Weights': [1024,256,4,4],
        '.Nodes.43.Arrays.Biases': [256],
        '.Nodes.44.Arrays.Scaling': [256],
        '.Nodes.44.Arrays.Biases': [256],
        '.Nodes.47.Arrays.Weights': [512,128,4,4],
        '.Nodes.47.Arrays.Biases': [128],
        '.Nodes.48.Arrays.Scaling': [128],
        '.Nodes.48.Arrays.Biases': [128],
        '.Nodes.51.Arrays.Weights': [256,64,4,4],
        '.Nodes.51.Arrays.Biases': [64],
        '.Nodes.52.Arrays.Scaling': [64],
        '.Nodes.52.Arrays.Biases': [64],
        '.Nodes.55.Arrays.Weights': [128,3,4,4],
        '.Nodes.55.Arrays.Biases': [3]
   }
   
   print('bind start')
   ex = sym.bind(
        mx.gpu(), 
        {name: mx.nd.ones(shape, ctx=mx.gpu()) for (name, shape) in 
shapes.items()},
        args_grad={name: mx.nd.ones(shape, ctx=mx.gpu()) for (name, shape) in 
shapes.items()}
   )
   print('bind end')
   time.sleep(5)
   
   print('forward start')
   ex.forward()
   print('forward end')
   time.sleep(5)
   
   print('backward start')
   ex.backward(mx.nd.ones([10, 3, 256, 256]))
   print('backward end')
   time.sleep(5)
   ```
   [sym.zip](https://github.com/apache/incubator-mxnet/files/7389359/sym.zip)
   
   By visually checking `watch nvidia-smi` on Linux as the script runs (with 
the help of the `sleep` statements), I see the following progression in memoery 
usage on v1.4 (075120ebb892341bb39c5962e17abc5e8e7b9733):
   
   bind -> 1 GB
   forward -> 2.3 GB
   backward -> 2.4 GB
   
   On the other hand, v1.6 (6eec9da55c5096079355d1f1a5fa58dcf35d6752) uses 
about 500MB of GPU memory, while v2.0 
(fabcd145cd496628791f9f2ea813048360ac33ca) needs an extra 100MB on top of v1.6.
   The memory gap starts with the `bind` statement and remains approximately 
constant, so it seems that binding somehow became more memory-hungry.
   
   Bonus question: why does memory allocation increase permanently after the 
forward pass? Obviously operators can allocate extra space during the 
computation, but aren't they supposed to free it when it's done? Is this 
increase on top of the memory used for binding expected?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to