Matthew Barnett <pyt...@mrabarnett.plus.com> added the comment:

Suppose you had a pattern:

    .*

It would advance one character on each iteration of the * until the . failed to 
match. The text is finite, so it would stop matching eventually.

Now suppose you had a pattern:

    (?:)*

On each iteration of the * it wouldn't advance, so it would keep matching 
forever.

A way to avoid that is to stop the * if it hasn't advanced.

The example pattern shows that there's still a problem. It advances if a group 
has matched, but that group doens't match until the first iteration, after the 
test, and does not, itself, advance. The * stops because it hasn't advanced, 
but, in this instance, that doesn't mean it never will.

The solution is for the * to check not only whether it has advanced, but also 
whether a group has changed. (Strictly speaking, the latter check is needed 
only if the repeated part tests whether a group also in the repeated part has 
changed, but it's probably not worth "optimising" for that possibility.)

In the regex module, it increments a "capture changed" counter whenever any 
group is changed (a group's first match or a change to a group's span). That 
makes it easier for the * to check. The code needs to save that counter for 
backtracking and restore it when backtracking.

I've mentioned only the *, but the same remarks apply to + and {...}, except 
that the {...} should keep repeating until it has reached its prescribed 
minimum.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue23692>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to