On 08/10/2017 02:24 PM, Sebastian Berg wrote: > On Thu, 2017-08-10 at 12:27 -0400, Allan Haldane wrote: >> On 08/07/2017 05:01 PM, Nisoli Isaia wrote: >>> Dear all, >>> I have a question about the behaviour of >>> >>> y = np.array(x, copy=False, dtype='float32') >>> >>> when x is a memmap. If we check the memmap attribute of mmap >>> >>> print "mmap attribute", y._mmap >>> >>> numpy tells us that y is not a memmap. >>> But the following code snippet crashes the python interpreter >>> >>> # opens the memmap >>> with open(filename,'r+b') as f: >>> mm = mmap.mmap(f.fileno(),0) >>> x = np.frombuffer(mm, dtype='float32') >>> >>> # builds an array from the memmap, with the option copy=False >>> y = np.array(x, copy=False, dtype='float32') >>> print "before", y >>> >>> # closes the file >>> mm.close() >>> print "after", y >>> >>> In my code I use memmaps to share read-only objects when doing >>> parallel >>> processing >>> and the behaviour of np.array, even if not consistent, it's >>> desirable. >>> I share scipy sparse matrices over many processes and if np.array >>> would >>> make a copy >>> when dealing with memmaps this would force me to rewrite part of >>> the sparse >>> matrices >>> code. >>> Would it be possible in the future releases of numpy to have >>> np.array >>> check, >>> if copy is false, if y is a memmap and in that case return a full >>> memmap >>> object >>> instead of slicing it? >> >> This does appear to be a bug in numpy or mmap. >> > > Frankly on first sight, I do not think it is a bug in either of them. > Numpy uses view (memmap really is just a name for a memory map backed > numpy array). The numpy array will hold a reference to the memory map > object in its `.base` attribute (or the base of the base, etc.). > > If you close a mmap object, and then keep using it, you can get > segfaults of course, I am not sure what you can do about it. Maybe > python can try to warn you when you exit the context/close a file > pointer, but I suppose: Python does memory management for you, it makes > doing IO management easy, but you need to manage the IO correctly. That > this segfaults and not just errors may be annoying, but seems the > nature of things on first sight. > > - Sebastian
I admit I have not had time to investigate it thoroughly, but it appears to me that the intended design of mmap was to make it impossible to close a mmap if there were still pointers to it. Consider the following behavior (python3): >>> import mmap >>> with open('test', 'r+b') as f: >>> mm = mmap.mmap(f.fileno(),0) >>> mv = memoryview(mm) >>> mm.close() BufferError: cannot close exported pointers exist If memoryview behaves this way, why doesn't/can't ndarray? (Both use the PEP3118 interface, as far as I understand). You can see in the mmap code that it tries to carefully keep track of any exported buffers, but numpy manages to bypass this: https://github.com/python/cpython/blob/b879fe82e7e5c3f7673c9a7fa4aad42bd05445d8/Modules/mmapmodule.c#L727 Allan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion