Ezio Melotti added the comment: > They are precompiled because for a program processing lots of email, > they are hot spots.
OK, I didn't know they were hot spots. Note that the regex are not recompiled everytime: they are compiled the first time and then taken from the cache (assuming they don't fall out from the bottom of the cache). This still has a small overhead though. > Can you explain your changes to the ecre regex (keeping in mind > that I don't know much about regex syntax). - (?P<charset>[^?]*?) # non-greedy up to the next ? is the charset + (?P<charset>[^?]*) # up to the next ? is the charset \? # literal ? (?P<encoding>[qb]) # either a "q" or a "b", case insensitive \? # literal ? - (?P<encoded>.*?) # non-greedy up to the next ?= is the encoded string + (?P<encoded>[^?]*) # up to the next ?= is the encoded string \?= # literal ?= At the beginning, the non-greedy *? is unnecessary because [^?]* already stops at the first ? found. The second change might actually be wrong if <encoded> is allowed to contain lone '?'s. The original regex used '.*?\?=', which means "match everything (including lone '?'s) until the first '?=')", mine means "match everything until the first '?'" which works fine as long as lone '?'s are not allowed. Serhiy's suggestion is semantically different, but it might be still suitable if having _has_surrogate return True even for surrogates not in range \udc80-\udcff is OK. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue11454> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com