On Mon, Sep 16, 2002 at 10:32:17AM +0300, Markus Laire wrote: > On 15 Sep 2002 at 22:41, Steve Fink wrote: > > Your code seems to backtrack to the beginning at every failure. First > code only backtracks one char at time. > > Huh? What implementation is that? I think my naive implementation > > gives > > > > 111 > > 112 > > 11 > > 121 > > 122 > > 12 > > 1 > > 211 > > 212 > > 21 > > 221 > > 222 > > 22 > > 2 > > >
Backtracking needs to be done on the state machine, not on the input string. For any pattern that can be reduced to a DFA, they will have equivalent output, but consider something like (unoptimized) "aax" =~ /(a|aa)x/ You try to match 'a' and get it. You try to match 'x' and fail. You are at "a.ax" in your input string. Backing up in the input string doesn't help. You need to backtrack to the next possibility of the previous match operation, in this case to the aa of the a|aa. My code is not backtracking to the beginning. It is backtracking to the previous possibility in the NFA. So /(a|a)*x/ matches (a|a) as many times as possible, then moves on to match the x. On failure, it undoes one match of (a|a), tries to match the (a|a) a different way, and if it succeeds matches as many times as possible again before trying the x. If (a|a) cannot match a different way, it tries to continue on to the x without matching that last (a|a) at all. Finally, if that fails, it undoes another match of the (a|a) and repeats. Programmatically, the implementation of R* looks something like: loop: match R or goto next goto loop back: if we have at least one R match left on the stack, goto R.back otherwise, fail (goto last backtrack point) next: ...(in this case, match x or goto back) where R.back is the backtrack point for a successful (a|a) match (it behaves like a continuation, so it is as if the "match R" call returns another result.) R is assumed to push its backtrack information on the stack if it succeeds, or to leave the stack untouched if it fails. R.back is assumed to clean up any previous backtrack state of R and change it to either a new backtrack state or clean it up, depending on whether it can succeed in another way.