On Nov 6, 2007 2:40 PM, Michael Bacarella <[EMAIL PROTECTED]> wrote: > > > > For various reasons I need to cache about 8GB of data from disk into > core on > > > application startup. > > > > Are you sure? On PC hardware, at least, doing this doesn't make any > > guarantee that accessing it actually going to be any faster. Is just > > mmap()ing the file a problem for some reason? > > > > I assume you're on a 64 bit machine. > > Very sure. If we hit the disk at all performance drops unacceptably. The > application > has low locality of reference so on-demand caching isn't an option. We get > the behavior > we want when we pre-cache; the issue is simply that it takes so long to > build this cache. >
You're not going to avoid hitting disk just by reading into your memory space. If your performance needs are really so tight that you can't rely on the VM system to keep pages you're using in memory, you're going to need to do this at a much lower (and system specific) level. mmap() with a reasonable VM system shouldn't be any slower than reading it all into memory. > > > Building this cache takes nearly 2 hours on modern hardware. I am > surprised > > > to discover that the bottleneck here is CPU. > > > > > > The reason this is surprising is because I expect something like this to > be > > > very fast: > > > > > > #!python > > > import array > > > > > > a = array.array('L') > > > > > > f = open('/dev/zero','r') > > > > > > while True: > > > > > > a.fromstring(f.read(8)) > > > > This just creates the same array over and over, forever. Is this > > really the code you meant to write? I don't know why you'd expect an > > infinite loop to be "fast"... > > Not exactly. fromstring() appends to the array. It's growing the array > towards You're correct, I misread the results of my testing. > infinity. Since infinity never finishes it's hard to get an idea of how > slow > this looks. Let's do 800MB instead. > That makes this a useless benchmark, though... > Here's an example of loading 800MB in C: > > $ time ./eat800 > > real 0m44.939s > user 0m10.620s > sys 0m34.303s > > $ cat eat800.c > #include <stdio.h> > #include <stdlib.h> > #include <fcntl.h> > > int main(void) > { > int f = open("/dev/zero",O_RDONLY); > int vlen = 8; > long *v = malloc((sizeof (long)) * vlen); > int i; > > for (i = 0; i < 100000000; i++) { > if (i >= vlen) { > vlen *= 2; > v = (long *)realloc(v,(sizeof (long)) * vlen); > } > read(f,v+i,sizeof (long)); > } > return 0; > } > > Here's the similar operation in Python: > $ time python eat800.py > > real 3m8.407s > user 2m40.189s > sys 0m27.934s > > $ cat eat800.py > #!/usr/bin/python > > import array > a = array.array('L') > > f = open('/dev/zero') > for i in xrange(100000000): > a.fromstring(f.read(8)) > > Note that you're not doing the same thing at all. You're pre-allocating the array in the C code, but not in Python (and I don't think you can). Is there some reason you're growing a 8 gig array 8 bytes at a time? > They spend about the same amount of time in system, but Python spends 4.7x > as much > CPU in userland as C does. > Python has to grow the array. It's possible that this is tripping a degenerate case in the gc behavior also (I don't know if array uses PyObjects for its internal buffer), and if it is you'll see an improvement by disabling GC. > And there's no solace in lists either: > > $ time python eat800.py > > real 4m2.796s > user 3m57.865s > sys 0m3.638s > > $ cat eat800.py > #!/usr/bin/python > > import struct > > d = [] > f = open('/dev/zero') > for i in xrange(100000000): > d.append(struct.unpack('L',f.read(8))[0]) > > > cPickle with protocol 2 has some promise but is more complicated because > arrays can't be pickled. In a perfect world I could do something like this > somewhere in the backroom: > > x = lengthy_number_crunching() > magic.save_mmap("/important-data") > > and in the application do... > > x = magic.mmap("/important-data") > magic.mlock("/important-data") > > and once the mlock finishes bringing important-data into RAM, at > the speed of your disk I/O subsystem, all accesses to x will be > hits against RAM. > You've basically described what mmap does, as far as I can tell. Have you tried just mmapping the file? > > Any thoughts? > > Did you try array.fromfile like I suggested? > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list