On Sat, 24 Aug 2002, Steve Fink wrote:

> In what order should matches be attempted when backtracking through
> '*'?
> 
> For example, consider (R|S)* where there's only enough stuff in the
> input for two matches of R or S. I expect it to try the possibilities
> in the order
> 
>  RR
>  RS
>  SR
>  SS
>  R
>  S

That's what I would think.  Also I like to think of multiple matches just 
failing at the end of the regex and trying again.  So it would backtrack 
and try S, then backtrack two and try SR, et cetera.

> because I would expect the greediness of the * to be considered before
> the order of alternation between R and S. Perhaps my expectation is
> dumb. But the straightforward implementation of this that I have now
> tries them in the order
> 
>  RR
>  RS
>  R
>  SR
>  SS
>  S

I think the former would be more intuitive, but it doesn't really matter. 
Better to allow optimization and say the order is undefined.

> I tried to figure out what order perl5 tries them in with an
> expression like
> 
>  /( a (?{ print "a" })
>     |
>     a (?{ print "b" })
>   )*
>   (?{ print "\n" })
>   (?!)
>  /x
> 
> matching against "aaa". But that mostly showed that the optimizer
> messes with things a lot. Changing the a's to [aX] and [aY] and the *
> to {0,5} improved that problem, but it still spits out a bizarre
> ordering. I guess I ought to learn to read the output of use re
> 'debug' better, but the question remains: is the order defined? If so,
> what should it be?


Luke

Reply via email to