[Numpy-discussion] Memory leak/fragmentation when using np.memmap

G Jones Wed, 18 May 2011 15:09:37 -0700

Hello,
I need to process several large (~40 GB) files. np.memmap seems ideal for
this, but I have run into a problem that looks like a memory leak or memory
fragmentation. The following code illustrates the problem


import numpy as np

x = np.memmap('mybigfile.bin',mode='r',dtype='uint8')
print x.shape   # prints (42940071360,) in my case
ndat = x.shape[0]
for k in range(1000):
  y = x[k*ndat/1000:(k+1)*ndat/1000].astype('float32')  #The astype ensures
that the data is read in from disk
  del y


One would expect such a program would have a roughly constant memory
footprint, but in fact 'top' shows that the RES memory continually
increases. I can see that the memory usage is actually occurring because the
OS eventually starts to swap to disk. The memory usage does not seem to
correspond with the total size of the file.

Has anyone see this behavior? Is there a solution? I found this article:
http://pushingtheweb.com/2010/06/python-and-tcmalloc/ which sounds similar,
but it seems that the ~40 MB chunks I am loading would be using mmap anyway
so could be returned to the OS.

I am using nearly the latest version of numpy from the git repository
(np.__version__ returns 2.0.0.dev-Unknown), Python 2.7.1, and RHEL 5 on
x86_64.

I appreciate any suggestions.
Thanks,
Glenn

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Memory leak/fragmentation when using np.memmap

Reply via email to