After struggling with NumPy's memmap object, I examined the code and detected three severe problems. I suggest that memmap is removed from NumPy, at least on Windows, as it's shortcomings is severe and undocumented.
Problem 1: I/O errors are never detected on Win32: On Windows, i/o errors are trapped using structured exception handling when using memory mapped objects. Neither NumPy nor Python use structured exception handling on Win32. This means that i/o errors (such as network or disk failure) will go undetected, and be a source of obscure bugs. The bugfix for this is to wrap any access attempt to an PyArrayObject's "data" pointer with __try and __except blocks, and using an MSVC compiler on Windows. GCC/MinGW cannot be used, as it does not support structured exception handling. In other words, PyArrayObject *memmap; __try { /* safe read/write access to memmap->data here */ } __except( GetExceptionCode() == EXCEPTION_IN_PAGE_ERROR ? EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH) { /* Windows signaled an I/O error, handle the problem here */ } Not only must NumPy itself be rewritten, but also any library getting a data pointer from a NumPy memmap array. Fixing this will be extremely difficult, if not impossible. The only safe way to access file data from NumPy is numpy.fromfile() and numpy.array.tofile(). Problem 2: Mapping always starts from the beginning of the file: Python's standard mmap object from the beginning of the file, regardless of the size. NumPy's memmap object depends on Python's mmap through the buffer protocol. Even though NumPy's memmap object takes an offset parameter, the actual memory mapping starts from the beginning of the file. Thus, virtual memory equal to the memmap object's offset parameter will be leaked until the memmap object is deleted. Problem 3: No 64 bit support on Windows or Linux: On Linux, large files must be memory mapped using mmap64 (or mmap2 if 4k boundaries are acceptable). On Windows, CreateFileMapping/MapViewOfFile has 64 bit support, but Python's mmap does not use it (the high offset DWORD is always zero). Only files smaller than 4 GB can be memory mapped. Regards, Sturla Molden _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion