On 27 Jul., 13:21, Dave Angel <da...@ieee.org> wrote: > (forwarding this message, as the reply was off-list) > > > > Kim Hansen wrote: > > 2009/7/24 Dave Angel <da...@ieee.org>: > > >> It's not a question of how much disk space there is, but how much virtual > >> space 32 bits can address. 2**32 is about 4 gig, and Windows XP reserves > >> about half of that for system use. Presumably a 64 bit OS would have a > >> much > >> larger limit. > > >> Years ago I worked on Sun Sparc system which had much more limited shared > >> memory access, due to hardware limitations. So 2gig seems pretty good to > >> me. > > >> There is supposed to be a way to tell the Windows OS to only use 1 gb of > >> virtual space, leaving 3gb for application use. But there are some > >> limitations, and I don't recall what they are. I believe it has to be done > >> globally (probably in Boot.ini), rather than per process. And some things > >> didn't work in that configuration. > > >> DaveA > > > Hi Dave, > > > In the related post I did on the numpy discussions: > > >http://article.gmane.org/gmane.comp.python.numeric.general/31748 > > > another user was kind enough to run my test program on both 32 bit and > > 64 bit machines. On the 64 bit machine, there was no such limit, very > > much in line with what you wrote. Adding the /3GB option in boot.ini > > did not increase the available memory as well. Apparently, Python > > needs to have been compiled in a way, which makes it possible to take > > advantage of that switch and that is either not the case or I did > > something else wrong as well. > > > I acknowledge the explanation concerning the address space available. > > Being an ignorant of the inner details of the implementation of mmap, > > it seems like somewhat an "implementation detail" to me that such an > > address wall is hit. There may be some good arguments from a > > programming point of view and it may be a relative high limit as > > compared to other systems but it is certainly at the low side for my > > application: I work with data files typically 200 GB in size > > consisting of datapackets each having a fixed size frame and a > > variable size payload. To handle these large files, I generate an > > "index" file consisting of just the frames (which has all the metadata > > I need for finding the payloads I am interested in) and "pointers" to > > where in the large data file each payload begins. This index file can > > be up to 1 GB in size and at times I need to have access to two of > > those at the same time (and then i hit the address wall). I would > > really really like to be able to access these index files in a > > read-only manner as an array of records on a file for which I use > > numpy.memmap (which wraps mmap.mmap) such that I can pick a single > > element, extract, e.g., every thousand value of a specific field in > > the record using the convenient indexing available in Python/numpy. > > Now it seems like I have to resort to making my own encapsulation > > layer, which seeks to the relevant place in the file, reads sections > > as bytestrings into recarrays, etc. Well, I must just get on with > > it... > > > I think it would be worthwhile specifying this 32 bit OS limitation in > > the documentation of mmap.mmap, as I doubt I am the only one being > > surprised about this address space limitation. > > > Cheers, > > Kim > > I agree that some description of system limitations should be included > in a system-specific document. There probably is one, I haven't looked > recently. But I don't think it belongs in mmap documentation. > > Perhaps you still don't recognize what the limit is. 32 bits can only > address 4 gigabytes of things as first-class addresses. So roughly the > same limit that's on mmap is also on list, dict, bytearray, or anything > else. If you had 20 lists taking 100 meg each, you would fill up > memory. If you had 10 of them, you might have enough room for a 1gb > mmap area. And your code takes up some of that space, as well as the > Python interpreter, the standard library, and all the data structures > that are normally ignored by the application developer. > > BTW, there is one difference between mmap and most of the other > allocations. Most data is allocated out of the swapfile, while mmap is > allocated from the specified file (unless you use -1 for fileno). > Consequently, if the swapfile is already clogged with all the other > running applications, you can still take your 1.8gb or whatever of your > virtual space, when much less than that might be available for other > kinds of allocations. > > Executables and dlls are also (mostly) mapped into memory just the same > as mmap. So they tend not to take up much space from the swapfile. In > fact, with planning, a DLL needn't take up any swapfile space (well, a > few K is always needed, realistically).. But that's a linking issue for > compiled languages. > > DaveA- Skjul tekst i anførselstegn - > > - Vis tekst i anførselstegn -
I do understand the 2 GB address space limitation. However, I think I have found a solution to my original numpy.memmap problem (which spun off to this problem), and that is PyTables, where I can address 2^64 data on a 32 bit machine using hd5 files and thus circumventing the "implementation detail" of the intermedia 2^32 memory address problem in the numpy.memmap/mmap.mmap implementation. http://www.pytables.org/moin I just watched the first tutorial video, and that seems like just what I am after (if it works as well in practise at it appears to do). http://showmedo.com/videos/video?name=1780000&fromSeriesID=178 Cheers, Kim -- http://mail.python.org/mailman/listinfo/python-list