[issue42885] Optimize re.search() for \A (and maybe ^)

Serhiy Storchaka Sat, 16 Jan 2021 00:59:39 -0800


Serhiy Storchaka <storchaka+cpyt...@gmail.com> added the comment:


^ matches not just the beginning of the string. It matches the beginning of a 
line, i.e. an anchor just after '\n'. If the input string contains '\n', the 
result cannot be found less than by linear time. If you want to check if the 
beginning of the string matches a regular expression, it is better to use 
match(). If you want the check if the whole string matches it, it is better to 
use fullmatch().

But in some cases you cannot choose what method to use. If you have a set of 
patterns, and only some of them should be anchored to the start of the string, 
you have to use search(). And while linear complexity for ^ is expected, 
search() is not optimized for \A.

So the original report is rejected, the behavior is expected and cannot be 
changed. It is not a bug. But some optimization can be added for \A, and 
perhaps the constant multiplier for ^ can be reduced too.

----------
title: Regex performance problem with ^ aka AT_BEGINNING -> Optimize 
re.search() for \A (and maybe ^)
versions:  -Python 3.9

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue42885>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue42885] Optimize re.search() for \A (and maybe ^)

Reply via email to