MRAB wrote:
Hi all,

I've been working on a new implementation of the re module. The details
are at http://bugs.python.org/issue2636, specifically from
http://bugs.python.org/issue2636#msg90954. I've included a .pyd file for
Python 2.6 on Windows if you want to try it out.

I'm interested in how fast it is generally, compared with the current re
module, but especially when faced with those 'pathological' regular
expressions which seem to take a long time to finish, for example:

    re.search(r"^(.+|D)*A$", "x" * 25 + "B")

which on my PC (1.8GHz) takes 18.98secs with the re module but <0.01secs with this new implementation.
I tried this on my 3GHz PC timings pretty much the same.

From here http://bugs.python.org/issue1721518 I knocked up this.

import time
import re
import regex

s = "Add.1, 2020 and Add.1, 2021-2023, 2025, 2028 and 2029 and Add.1) R"
r = "(?:\s|,|and|Add\S*?|Parts?|\([^\)]*\)|[IV\-\d]+)*$"
t0 = time.clock()
print regex.search(r, s)
t1 = time.clock()
print "time", t1 - t0

print "It's going to crash"
t0 = time.clock()
print re.search(r, s)
t1 = time.clock()
print "It hasn't crashed time", t1 - t0

Output shows a slight change in timing:).

<_regex.RE_Match object at 0x0243A1A0>
time 0.00279001940191
It's going to crash
<_sre.SRE_Match object at 0x024396B0>
It hasn't crashed time 98.4238155967


TIA

I also got the files bm_regex_effbot.py and bm_regex_v8.py from
http://code.google.com/p/unladen-swallow/source/browse/#svn/tests/performance and ran them, then reran them having substituted regex for re. Output timings were roughly effbot re 0.14secs, effbot regex 1.16secs, v8 re 0.17secs and v8 regex 0.67secs.

HTH.

--
Kindest regards.

Mark Lawrence.

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to