waytrue17 commented on issue #20959:
URL:
https://github.com/apache/incubator-mxnet/issues/20959#issuecomment-1079551643
It looks like the memory leak in the above script is because we instantiate
multiple dataloader objects in the for loop. Having one dataloader object seems
to mitigate the issue:
```
import mxnet.gluon as gl
import mxnet as mx
import gc
if __name__ == "__main__":
gpu_ctx = mx.gpu()
model = gl.nn.Embedding(10, 5)
model.initialize(ctx=gpu_ctx)
X = mx.random.uniform(shape=(1000, 3))
dataset = mx.gluon.data.dataset.ArrayDataset(X)
num_workers = 8
data_loader = gl.data.DataLoader(
dataset,
batch_size=1,
num_workers=num_workers,
)
for epoch in range(5):
for batch in data_loader:
# move data to gpu
data_gpu = batch.copyto(mx.gpu())
# forward
l = model(data_gpu)
# force immediate compute
l.asnumpy()
mx.nd.waitall()
a, b = mx.context.gpu_memory_info(0)
print(f"num_workers: {num_workers} epoch {epoch}: "
f"current gpu memory {(b - a) / (1024 * 1024 * 1024)} GB, "
f"Total gpu memory {b / (1024 * 1024 * 1024)} GB.")
data_loader.refresh()
```
```
num_workers: 8 epoch 0: current gpu memory 1.43017578125 GB, Total gpu
memory 15.78192138671875 GB.
num_workers: 8 epoch 1: current gpu memory 1.43017578125 GB, Total gpu
memory 15.78192138671875 GB.
num_workers: 8 epoch 2: current gpu memory 1.43017578125 GB, Total gpu
memory 15.78192138671875 GB.
num_workers: 8 epoch 3: current gpu memory 1.43017578125 GB, Total gpu
memory 15.78192138671875 GB.
num_workers: 8 epoch 4: current gpu memory 1.43017578125 GB, Total gpu
memory 15.78192138671875 GB.
```
Seems previously we had `mshadow::DeleteStream<gpu>(stream)` to clean up the
GPU memory by the life cycle of dataloader object, but it had a [segfault
issue](https://github.com/apache/incubator-mxnet/issues/19360). In the
workaround [PR](https://github.com/apache/incubator-mxnet/pull/19378), we
removed `mshadow::DeleteStream<gpu>(stream)` and relied on the OS to clean up
memory at the end of the program. That may explain why we see memory leak when
creating multiple dataloaders in the program.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]