Peter Otten wrote:
Robin Becker wrote:


#sscan1.py thanks to Skip
import sys, time, mmap, os, re
fn = sys.argv[1]
fh=os.open(fn,os.O_BINARY|os.O_RDONLY)
s=mmap.mmap(fh,0,access=mmap.ACCESS_READ)
l=n=0
t0 = time.time()
for mat in re.split("XXXXX", s):


re.split() returns a list, not a generator, and this list may consume a lot
of memory.


..... that would certainly be the case and may answer why the simple way is so
bad for larger memory. I'll have a go at this experiment as well. My original
intention was to find the start of each match as a scanner would and this would
certainly do that. However, my observation with the trivial byte scan would seem
to imply that just scanning the file causes vm problems (at least in xp). I 
suppose it's
hard to explain to the os that I actually only need the relevant few pages.
--
Robin Becker
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to