William McBrine <[EMAIL PROTECTED]> wrote: > Hi all, > > I'm pretty new to Python (a little over a month). I was wondering -- is > something like this: > > s = re.compile('whatever') > > def t(whatnot): > return s.search(whatnot) > > for i in xrange(1000): > print t(something[i]) > > significantly faster than something like this: > > def t(whatnot): > s = re.compile('whatever') > return s.search(whatnot) > > for i in xrange(1000): > result = t(something[i]) > > ? Or is Python clever enough to see that the value of s will be the same > on every call, and thus only compile it once? >
The best way to answer these questions is always to try it out for yourself. Have a look at 'timeit.py' in the library: you can run it as a script to time simple things or import it from longer scripts. C:\Python25>python lib/timeit.py -s "import re;s=re.compile('whatnot')" "s.search('some long string containing a whatnot')" 1000000 loops, best of 3: 1.05 usec per loop C:\Python25>python lib/timeit.py -s "import re" "re.compile('whatnot').search('some long string containing a whatnot')" 100000 loops, best of 3: 3.76 usec per loop C:\Python25>python lib/timeit.py -s "import re" "re.search('whatnot', 'some long string containing a whatnot')" 100000 loops, best of 3: 3.98 usec per loop So it looks like it takes a couple of microseconds overhead if you don't pre-compile the regular expression. That could be significant if you have simple matches as above, or irrelevant if the match is complex and slow. You can also try measuring the compile time separately: C:\Python25>python lib/timeit.py -s "import re" "re.compile('whatnot')" 100000 loops, best of 3: 2.36 usec per loop C:\Python25>python lib/timeit.py -s "import re" "re.compile('<(?:p|div)[^>]*>(?P<pat0>(?:(?P<atag0>\\<a[^>]*\\>)\\<img[^>]+class\\s*=[^=>]*captioned[^>]+\\>\\</a\\>)|\\<img[^>]+class\\s*=[^=>]*captioned[^>]+\\>)</(?:p|div)>|(?P<pat1>(?:(?P<atag1>\\<a[^>]*\\>)\\<img[^>]+class\\s*=[^=>]*captioned[^>]+\\>\\</a\\>)|\\<img[^>]+class\\s*=[^=>]*captioned[^>]+\\>)')" 100000 loops, best of 3: 2.34 usec per loop It makes no difference whether you use a trivial regular expression or a complex one: Python remembers (if I remember correctly) the last 100 expressions it compiled,so the compilation overhead will be pretty constant. -- http://mail.python.org/mailman/listinfo/python-list