On Mon, Sep 16, 2002 at 10:32:17AM +0300, Markus Laire wrote:
> On 15 Sep 2002 at 22:41, Steve Fink wrote:
> 
> Your code seems to backtrack to the beginning at every failure. First 
> code only backtracks one char at time.
> > Huh? What implementation is that? I think my naive implementation
> > gives
> > 
> > 111
> > 112
> > 11
> > 121
> > 122
> > 12
> > 1
> > 211
> > 212
> > 21
> > 221
> > 222
> > 22
> > 2
> > 
> 

Backtracking needs to be done on the state machine, not on the input
string. For any pattern that can be reduced to a DFA, they will have
equivalent output, but consider something like (unoptimized)

 "aax" =~ /(a|aa)x/

You try to match 'a' and get it. You try to match 'x' and fail. You
are at "a.ax" in your input string. Backing up in the input string
doesn't help. You need to backtrack to the next possibility of the
previous match operation, in this case to the aa of the a|aa.

My code is not backtracking to the beginning. It is backtracking to
the previous possibility in the NFA. So /(a|a)*x/ matches (a|a) as
many times as possible, then moves on to match the x. On failure, it
undoes one match of (a|a), tries to match the (a|a) a different way,
and if it succeeds matches as many times as possible again before
trying the x. If (a|a) cannot match a different way, it tries to
continue on to the x without matching that last (a|a) at all. Finally,
if that fails, it undoes another match of the (a|a) and repeats.

Programmatically, the implementation of R* looks something like:

  loop: match R or goto next
        goto loop

  back: if we have at least one R match left on the stack, goto R.back
        otherwise, fail (goto last backtrack point)

  next: ...(in this case, match x or goto back)

where R.back is the backtrack point for a successful (a|a) match (it
behaves like a continuation, so it is as if the "match R" call returns
another result.) R is assumed to push its backtrack information on the
stack if it succeeds, or to leave the stack untouched if it fails.
R.back is assumed to clean up any previous backtrack state of R and
change it to either a new backtrack state or clean it up, depending on
whether it can succeed in another way.

Reply via email to