Ezio Melotti added the comment:

> They are precompiled because for a program processing lots of email,
> they are hot spots.

OK, I didn't know they were hot spots.  Note that the regex are not recompiled 
everytime: they are compiled the first time and then taken from the cache 
(assuming they don't fall out from the bottom of the cache).  This still has a 
small overhead though.

> Can you explain your changes to the ecre regex (keeping in mind
> that I don't know much about regex syntax).

-  (?P<charset>[^?]*?)   # non-greedy up to the next ? is the charset
+  (?P<charset>[^?]*)    # up to the next ? is the charset
   \?                    # literal ?
   (?P<encoding>[qb])    # either a "q" or a "b", case insensitive
   \?                    # literal ?
-  (?P<encoded>.*?)      # non-greedy up to the next ?= is the encoded string
+  (?P<encoded>[^?]*)    # up to the next ?= is the encoded string
   \?=                   # literal ?=

At the beginning, the non-greedy *? is unnecessary because [^?]* already stops 
at the first ? found.
The second change might actually be wrong if <encoded> is allowed to contain 
lone '?'s.  The original regex used '.*?\?=', which means "match everything 
(including lone '?'s) until the first '?=')", mine means "match everything 
until the first '?'" which works fine as long as lone '?'s are not allowed.

Serhiy's suggestion is semantically different, but it might be still suitable 
if having _has_surrogate return True even for surrogates not in range 
\udc80-\udcff is OK.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue11454>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to