William McBrine <[EMAIL PROTECTED]> wrote:

> Hi all,
> 
> I'm pretty new to Python (a little over a month). I was wondering -- is 
> something like this:
> 
> s = re.compile('whatever')
> 
> def t(whatnot):
>     return s.search(whatnot)
> 
> for i in xrange(1000):
>     print t(something[i])
> 
> significantly faster than something like this:
> 
> def t(whatnot):
>     s = re.compile('whatever')
>     return s.search(whatnot)
> 
> for i in xrange(1000):
>     result = t(something[i])
> 
> ? Or is Python clever enough to see that the value of s will be the same 
> on every call, and thus only compile it once?
> 

The best way to answer these questions is always to try it out for 
yourself. Have a look at 'timeit.py' in the library: you can run 
it as a script to time simple things or import it from longer scripts.

C:\Python25>python lib/timeit.py -s "import re;s=re.compile('whatnot')" 
"s.search('some long string containing a whatnot')"
1000000 loops, best of 3: 1.05 usec per loop

C:\Python25>python lib/timeit.py -s "import re" 
"re.compile('whatnot').search('some long string containing a whatnot')"
100000 loops, best of 3: 3.76 usec per loop

C:\Python25>python lib/timeit.py -s "import re" "re.search('whatnot', 'some 
long string containing a whatnot')"
100000 loops, best of 3: 3.98 usec per loop

So it looks like it takes a couple of microseconds overhead if you 
don't pre-compile the regular expression. That could be significant 
if you have simple matches as above, or irrelevant if the match is 
complex and slow.

You can also try measuring the compile time separately:

C:\Python25>python lib/timeit.py -s "import re" "re.compile('whatnot')"
100000 loops, best of 3: 2.36 usec per loop

C:\Python25>python lib/timeit.py -s "import re" 
"re.compile('<(?:p|div)[^>]*>(?P<pat0>(?:(?P<atag0>\\<a[^>]*\\>)\\<img[^>]+class\\s*=[^=>]*captioned[^>]+\\>\\</a\\>)|\\<img[^>]+class\\s*=[^=>]*captioned[^>]+\\>)</(?:p|div)>|(?P<pat1>(?:(?P<atag1>\\<a[^>]*\\>)\\<img[^>]+class\\s*=[^=>]*captioned[^>]+\\>\\</a\\>)|\\<img[^>]+class\\s*=[^=>]*captioned[^>]+\\>)')"
100000 loops, best of 3: 2.34 usec per loop

It makes no difference whether you use a trivial regular expression 
or a complex one: Python remembers (if I remember correctly) the last 
100 expressions it compiled,so the compilation overhead will be pretty 
constant.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to