On Tue, 26 Apr 2005 20:54:53 +0000, Robin Becker wrote:
Skip Montanaro wrote: ...
If I mmap() a file, it's not slurped into main memory immediately, though as you pointed out, it's charged to my process's virtual memory. As I access bits of the file's contents, it will page in only what's necessary. If I mmap() a huge file, then print out a few bytes from the middle, only the page containing the interesting bytes is actually copied into physical memory.
....
my simple rather stupid experiment indicates that windows mmap at least will reserve 25Mb of paged file for a linear scan through a 25Mb file. I probably only need 4096b to scan. That's a lot less than even the page table requirement. This isn't rocket science just an old style observation.
Are you trying to claim Skip is wrong, or what? There's little value in
saying that by mapping a file of 25MB into VM pages, you've increased your
allocated paged file space by 25MB. That's effectively tautological.
If you are trying to claim Skip is wrong, you *do not understand* what you are talking about. Talk less, listen and study more. (This is my best guess, as like I said, observing that allocating things increases the number of things that are allocated isn't worth posting so my thought is you think you are proving something. If you really are just posting something tautological, my apologies and disregard this paragraph but, well, it's certainly not out of line at this point.)
Well I obviously don't understand so perhaps you can explain these results
I implemented a simple scanning algorithm in two ways. First buffered scan tscan0.py; second mmapped scan tscan1.py.
For small file sizes the times are comparable.
C:\code\reportlab\demos\gadflypaper>\tmp\tscan0.py bingo.pdf len=27916653 w=103 time=22.13
C:\code\reportlab\demos\gadflypaper>\tmp\tscan1.py bingo.pdf len=27916653 w=103 time=22.20
for large file sizes when paging becomes of interest buffered scan wins even though it has to do a lot more python statements. If this were coded in C the results would be plainer still. As I said this isn't about right or wrong it's an observation. If I inspect the performance monitor tscan0 is at 100%, but tscan1 is at 80-90% and all of memory gets used up so paging is important. This may be an effect of the poor design of xp if so perhaps it won't hold for other os's.
C:\code\reportlab\demos\gadflypaper>\tmp\tscan0.py dingo.dat len=139583265 w=103 time=110.91
C:\code\reportlab\demos\gadflypaper>\tmp\tscan1.py dingo.dat len=139583265 w=103 time=140.53
C:\code\reportlab\demos\gadflypaper>cat \tmp\tscan0.py import sys, time fn = sys.argv[1] f=open(fn,'rb') n=0 w=0 t0 = time.time() while 1: buf = f.read(4096) lb = len(buf) if not lb: break n += lb for i in xrange(lb): w ^= ord(buf[i]) t1 = time.time()
print "len=%d w=%d time=%.2f" % (n, w, (t1-t0))
C:\code\reportlab\demos\gadflypaper>cat \tmp\tscan1.py import sys, time, mmap, os fn = sys.argv[1] fh=os.open(fn,os.O_BINARY|os.O_RDONLY) s=mmap.mmap(fh,0,access=mmap.ACCESS_READ) n=len(s) w=0 t0 = time.time() for i in xrange(n): w ^= ord(s[i]) t1 = time.time()
print "len=%d w=%d time=%.2f" % (n, w, (t1-t0))
-- Robin Becker
-- http://mail.python.org/mailman/listinfo/python-list