New submission from Jonas H. <jo...@lophus.org>:

re.match(p, ...) with a pre-compiled pattern p = re.compile(...) can be much 
slower than calling p.match(...). Probably mostly in cases with "easy" patterns 
and/or short strings.

The culprit is that re.match -> re._compile can spend a lot of time looking up 
p its internal _cache, where it will never find p:

def _compile(pattern, flags):
    ...
    try:
        return _cache[type(pattern), pattern, flags]
    except KeyError:
        pass
    if isinstance(pattern, Pattern):
        ...
        return pattern
    ...
        _cache[type(pattern), pattern, flags] = p
    ...

_compile will always return before the _cache is set if given a Pattern object.

By simply reordering the isinstance(..., Pattern) check we can safe a lot of 
time.

I've seen speedups in the range of 2x-5x on some of my data. As an example:

Raw speed of re.compile(p, ...).match():
    time ./python.exe -c 'import re'\n'pat = re.compile(".").match'\n'for _ in 
range(1_000_000): pat("asdf")'
    Executed in  190.59 millis

Speed with this optimization:
    time ./python.exe -c 'import re'\n'pat = re.compile(".")'\n'for _ in 
range(1_000_000): re.match(pat, "asdf")'
    Executed in  291.39 millis

Speed without this optimization:
    time ./python.exe -c 'import re'\n'pat = re.compile(".")'\n'for _ in 
range(1_000_000): re.match(pat, "asdf")'
    Executed in  554.42 millis

----------
components: Regular Expressions
messages: 403851
nosy: ezio.melotti, jonash, mrabarnett
priority: normal
severity: normal
status: open
title: Speed up re.match with pre-compiled patterns
type: performance
versions: Python 3.11

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue45462>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to