barry-jin opened a new issue #20315: URL: https://github.com/apache/incubator-mxnet/issues/20315
Looks like GPU memory will not be released after using `asnumpy()` method for a large mxnet numpy ndarray with gpu context. Code to reproduce: ```python import mxnet as mx from mxnet import npx, gluon from mxnet.gluon import nn npx.set_np() mx.context._current.set(mx.gpu(0)) def check_layer_forward_withinput(net, x): x_hybrid = x.copy() x.attach_grad() x_hybrid.attach_grad() net.initialize() with mx.autograd.record(): out1 = net(x) out1.backward() net.hybridize() with mx.autograd.record(): out2 = net(x_hybrid) out2.backward() a, b = mx.context.gpu_memory_info(0) print("Used memory {} GB, Total memory {} GB.".format((b - a) / (1024 * 1024 * 1024), b / (1024 * 1024 * 1024))) mx.test_utils.assert_almost_equal(x.grad, x_hybrid.grad, rtol=1e-5, atol=1e-6) mx.test_utils.assert_almost_equal(out1, out2, rtol=1e-5, atol=1e-6) def test_slice_pooling2d(): # transpose shape to bring feature dimension 'c' from 2nd position to last def transpose(shape): return (shape[0],) + shape[2:] + (shape[1],) for layout in ['NCHW', 'NHWC']: max_pooling = nn.MaxPool2D(layout=layout) avg_pooling = nn.AvgPool2D(layout=layout) global_maxpooling = nn.GlobalMaxPool2D(layout=layout) global_avgpooling = nn.GlobalAvgPool2D(layout=layout) pooling_layers = [max_pooling, avg_pooling, global_maxpooling, global_avgpooling] class Net(gluon.HybridBlock): def __init__(self, slice, pooling_layer, **kwargs): super(Net, self).__init__(**kwargs) self.slice = slice self.pool0 = pooling_layer def forward(self, x): x_slice = mx.npx.slice(x, begin=self.slice[0], end=self.slice[1]) out = self.pool0(x_slice) return out xshape = (16, 128, 256, 256) # xshape = (8, 64, 128, 128) slice_shape = (4, 16, 32, 64) if layout == 'NHWC': xshape = transpose(xshape) slice_shape = transpose(slice_shape) x = mx.np.random.uniform(size=xshape) slice = [(0, 0, 0, 0), slice_shape] for i in range(len(pooling_layers)): net = Net(slice, pooling_layers[i]) check_layer_forward_withinput(net, x) if __name__ == '__main__': test_slice_pooling2d() ``` Before comment out `mx.test_utils.assert_almost_equal()` these two lines: ``` Used memory 2.142578125 GB, Total memory 14.755615234375 GB. Used memory 4.119140625 GB, Total memory 14.755615234375 GB. Used memory 4.619140625 GB, Total memory 14.755615234375 GB. Used memory 5.119140625 GB, Total memory 14.755615234375 GB. Used memory 5.119140625 GB, Total memory 14.755615234375 GB. Used memory 6.119140625 GB, Total memory 14.755615234375 GB. Used memory 6.619140625 GB, Total memory 14.755615234375 GB. Used memory 7.119140625 GB, Total memory 14.755615234375 GB. ``` After comment out these two lines: ``` Used memory 2.142578125 GB, Total memory 14.755615234375 GB. Used memory 2.142578125 GB, Total memory 14.755615234375 GB. Used memory 2.1171875 GB, Total memory 14.755615234375 GB. Used memory 2.1171875 GB, Total memory 14.755615234375 GB. Used memory 2.1171875 GB, Total memory 14.755615234375 GB. Used memory 2.1171875 GB, Total memory 14.755615234375 GB. Used memory 2.1171875 GB, Total memory 14.755615234375 GB. Used memory 2.6171875 GB, Total memory 14.755615234375 GB. ``` After change xshape to a relatively smaller one (8, 64, 128, 128), the memory usage looks normal. _Originally posted by @barry-jin in https://github.com/apache/incubator-mxnet/pull/20262#issuecomment-849217284_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org For additional commands, e-mail: issues-h...@mxnet.apache.org