On Wed, Dec 9, 2015 at 9:51 AM, Mathieu Dubois <mathieu.dub...@icm-institute.org> wrote: > Dear all, > > If I am correct, using mmap_mode with Npz files has no effect i.e.: > f = np.load("data.npz", mmap_mode="r") > X = f['X'] > will load all the data in memory. > > Can somebody confirm that? > > If I'm correct, the mmap_mode argument could be passed to the NpzFile class > which could in turn perform the correct operation. One way to handle that > would be to use the ZipFile.extract method to write the Npy file on disk and > then load it with numpy.load with the mmap_mode argument. Note that the user > will have to remove the file to reclaim disk space (I guess that's OK). > > One problem that could arise is that the extracted Npy file can be large > (it's the purpose of using memory mapping) and therefore it may be useful to > offer some control on where this file is extracted (for instance /tmp can be > too small to extract the file here). numpy.load could offer a new option for > that (passed to ZipFile.extract).
I have struggled for a long time with a similar (albeit more obscure problem) with PyFITS / astropy.io.fits when it comes to supporting memory-mapping of compressed FITS files. For those unaware FITS is a file format used primarily in Astronomy. I have all kinds of wacky ideas for optimizing this, but at the moment when you load data from a compressed FITS file with memory-mapping enabled, obviously there's not much benefit because the contents of the file are uncompressed in memory (there is a *little* benefit in that the compressed data is mmap'd, but the compressed data is typically much smaller than the uncompressed data). Currently, in this case, I just issue a warning when the user explicitly requests mmap=True, but won't get much benefit from it. Maybe np.load could do the same, but I don't have a strong opinion about it. (I only added the warning in PyFITS because a user requested it and was kind enough to provide a patch--seemed reasonable). Erik _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion