Fredrik Lundh wrote: > Paul Watson wrote: > >>This is Cyngwin on Windows XP. > > using cygwin to analyze performance characteristics of portable API:s > is a really lousy idea.
Ok. So, I agree. That is just what I had at hand. Here are some other numbers to which due diligence has also not been applied. Source code is at the bottom for both file and mmap process. I would be willing for someone to tell me what I could improve. $ python -V Python 2.4.1 $ uname -a Linux ruth 2.6.13-1.1532_FC4 #1 Thu Oct 20 01:30:08 EDT 2005 i686 $ cat /proc/meminfo|head -2 MemTotal: 514232 kB MemFree: 47080 kB $ time ./scanfile.py 16384 real 0m0.06s user 0m0.03s sys 0m0.01s $ time ./scanfilemmap.py 16384 real 0m0.10s user 0m0.06s sys 0m0.00s Using a ~ 250 MB file, not even half of physical memory. $ time ./scanfile.py 16777216 real 0m11.19s user 0m10.98s sys 0m0.17s $ time ./scanfilemmap.py 16777216 real 0m55.09s user 0m43.12s sys 0m11.92s ============================== $ cat scanfile.py #!/usr/bin/env python import sys fn = 't.dat' ss = '\x00\x00\x01\x00' ss = 'time' be = len(ss) - 1 # length of overlap to check blocksize = 64 * 1024 # need to ensure that blocksize > overlap fp = open(fn, 'rb') b = fp.read(blocksize) count = 0 while len(b) > be: count += b.count(ss) b = b[-be:] + fp.read(blocksize) fp.close() print count sys.exit(0) =================================== $ cat scanfilemmap.py #!/usr/bin/env python import sys import os import mmap fn = 't.dat' ss = '\x00\x00\x01\x00' ss='time' fp = open(fn, 'rb') b = mmap.mmap(fp.fileno(), os.stat(fp.name).st_size, access=mmap.ACCESS_READ) count = 0 foundpoint = b.find(ss, 0) while foundpoint != -1 and (foundpoint + 1) < b.size(): #print foundpoint count = count + 1 foundpoint = b.find(ss, foundpoint + 1) b.close() print count fp.close() sys.exit(0) -- http://mail.python.org/mailman/listinfo/python-list