Hello, I have seen the effect you describe, I had originally assumed this was the case, but in fact there seems to be more to the problem. If it were only the effect you mention, there should not be any memory error because the OS will drop the pages when the memory is actually needed for something. At least I would hope so. If not, this seems like a huge problem for linux.
As a followup, I managed to install tcmalloc as described in the article I mentioned. Running the example I sent now shows a constant memory foot print as expected. I am surprised such a solution was necessary. Certainly others must work with such large datasets using numpy/python? Thanks, Glenn On Wed, May 18, 2011 at 4:21 PM, Pauli Virtanen <[email protected]> wrote: > On Wed, 18 May 2011 15:09:31 -0700, G Jones wrote: > [clip] > > import numpy as np > > > > x = np.memmap('mybigfile.bin',mode='r',dtype='uint8') print x.shape # > > prints (42940071360,) in my case ndat = x.shape[0] > > for k in range(1000): > > y = x[k*ndat/1000:(k+1)*ndat/1000].astype('float32') #The astype > > ensures > > that the data is read in from disk > > del y > > > > One would expect such a program would have a roughly constant memory > > footprint, but in fact 'top' shows that the RES memory continually > > increases. I can see that the memory usage is actually occurring because > > the OS eventually starts to swap to disk. The memory usage does not seem > > to correspond with the total size of the file. > > Your OS probably likes to keep the pages touched in memory and in swap, > rather than dropping them. This happens at least on Linux. > > You can check that an equivalent simple C program displays > the same behavior (use with file "data" with enough bytes): > > #include <sys/mman.h> > #include <fcntl.h> > #include <unistd.h> > > int main() > { > unsigned long size = 2000000000; > unsigned long i; > char *p; > int fd; > char sum; > > fd = open("data", O_RDONLY); > p = (char*)mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0); > > sum = 0; > for (i = 0; i < size; ++i) { > sum += *(p + i); > } > munmap(p, size); > close(fd); > > return 0; > } > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
