New submission from Jonas H. <jo...@lophus.org>:
re.match(p, ...) with a pre-compiled pattern p = re.compile(...) can be much slower than calling p.match(...). Probably mostly in cases with "easy" patterns and/or short strings. The culprit is that re.match -> re._compile can spend a lot of time looking up p its internal _cache, where it will never find p: def _compile(pattern, flags): ... try: return _cache[type(pattern), pattern, flags] except KeyError: pass if isinstance(pattern, Pattern): ... return pattern ... _cache[type(pattern), pattern, flags] = p ... _compile will always return before the _cache is set if given a Pattern object. By simply reordering the isinstance(..., Pattern) check we can safe a lot of time. I've seen speedups in the range of 2x-5x on some of my data. As an example: Raw speed of re.compile(p, ...).match(): time ./python.exe -c 'import re'\n'pat = re.compile(".").match'\n'for _ in range(1_000_000): pat("asdf")' Executed in 190.59 millis Speed with this optimization: time ./python.exe -c 'import re'\n'pat = re.compile(".")'\n'for _ in range(1_000_000): re.match(pat, "asdf")' Executed in 291.39 millis Speed without this optimization: time ./python.exe -c 'import re'\n'pat = re.compile(".")'\n'for _ in range(1_000_000): re.match(pat, "asdf")' Executed in 554.42 millis ---------- components: Regular Expressions messages: 403851 nosy: ezio.melotti, jonash, mrabarnett priority: normal severity: normal status: open title: Speed up re.match with pre-compiled patterns type: performance versions: Python 3.11 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue45462> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com