En Sun, 22 Jul 2007 01:56:32 -0300, Gilles Ganault <[EMAIL PROTECTED]> escribió:
> Incidently, as far as using Re alone is concerned, it appears that > re.MULTILINE isn't enough to get Re to include newlines: re.DOTLINE > must be added. > > Problem is, when I add re.DOTLINE, the search takes less than a second > for a 500KB file... and about 1mn30 for a file that's 1MB, with both > files holding similar contents. > > Why such a huge difference in performance? > > pattern = "<span class=.?defaut.?>(\d+:\d+).*?</span>" Try to avoid using ".*" and ".+" (even the non greedy forms); in this case, I think you want the scan to stop when it reaches the ending </span> or any other tag, so use: [^<]* instead. BTW, better to use a raw string to represent the pattern: pattern = r"...\d+..." -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list