[issue35915] re.search extreme slowness (looks like hang/livelock), searching for patterns containing .* in a large string

Tim Peters Wed, 06 Feb 2019 09:15:50 -0800


Tim Peters <t...@python.org> added the comment:


Yes, it's quadratic time.  If the string being searched has N characters, first 
it fails to find "x" in all N of 'em, then `.*` advances by one and it fails to 
find "x" in the trailing N-1 characters, then again in the trailing N-2, and so 
on.  N + N-1 + N-2 + ... + 1 is quadratic in N.

That's how this kind of regexp engine works.  And it's mild, as such things go: 
 you can also create poor regexps that take time _exponential_ in N that fail 
to match certain strings.

It's unlikely this will change without replacing Python's regexp implementation 
entirely.  For why, see Jeffrey Friedl's book "Mastering Regular Expressions" 
(published by O'Reilly).  That book also spells out techniques for crafting 
regexps that don't suck ;-)  It's not a small topic, alas.

----------
nosy: +tim.peters

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35915>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35915] re.search extreme slowness (looks like hang/livelock), searching for patterns containing .* in a large string

Reply via email to