Nick Craig-Wood wrote: > George Sakkis <[EMAIL PROTECTED]> wrote: > > I've been trying to track down a memory leak (which I initially > > attributed erroneously to numpy) and it turns out to be caused by a > > memory mapped file. It seems that mmap caches without limit the chunks > > it reads, as the memory usage grows to several hundreds MBs according > > to the Windows task manager before it dies with a MemoryError. I'm > > positive that these chunks are not referenced anywhere else; in fact if > > I change the mmap object to a normal file, memory usage remains > > constant. The documentation of mmap doesn't mention anything about > > this. Can the caching strategy be modified at the user level ? > > I'm not familiar with mmap() on windows, but assuming it works the > same way as unix... > > The point of mmap() is to map files into memory. It is completely up > to the OS to bring pages into memory for you to read / write to, and > completely up to the OS to get rid of them again. > > What you would expect is that the file is demand paged into memory as > you access bits of it. These pages will remain in memory until the OS > feels some memory pressure when the pages will be written out if dirty > and then dropped. > > The OS will try to keep hold of pages as long as possible just in case > you need them again. The pages dropped should be the least recently > used pages. > > I wouldn't have expected a MemoryError though... > > Did you do mmap.flush() after writing?
The file is written once and then opened as read-only, there's no flushing. So if caching is completely up to the OS, I take it that my options are either (1) modify my algorithms so that they work in fixed-size batches instead of arbitrarily long sequences or (2) implement my own memory-mapping scheme to fit my algorithms. I guess (1) would be the less trouble overall, or is there a way to give a hint to the OS on how large cache can it use ? George -- http://mail.python.org/mailman/listinfo/python-list