On Sat, Dec 18, 2004 at 08:47:42AM -0800, Larry Wall wrote: > : This test seems to cause an infinite loop > : (with parrot_2004-12-16_160001) > : > : p6rule_isnt('a--', '^[a?b?]*$', 're_tests 387 (#438)'); # infinite loop > > Detecting failure to progress can be quite tricky, actually. It's easy > enough to detect that it *might* be an infinite loop. But that pattern > would succeed the string were all a's and b's. It's not enough to figure > out that you're at the same position or the same state. You have to figure > out that you're at the same position and the same state, and you may well > have visited different positions in this state, or different states in > this position. So a naive solution requires N**2 in time or space.
In PGE I've been thinking this won't be *too* difficult (and I'll fully admit to the possibility of being naive here). Our "states" are actually encoded into the pattern's subroutine code, with our "current state" being held by Parrot's execution pointer and the various stacks, and we're already keeping track of the starting and current position of each "substring" being matched by a repeating group. (Or, if we're not, we certainly can keep track of it in a stack of some sort. So, when we get to the end of the bracketed group, we look to see if the current position has changed at all since we started the group, and if not we refuse to repeat the group again but just go on to whatever other checks need to be made. If going on to the remaining checks causes a match failure, we're just going to backtrack into the group's subexpression anyway, which would then start doing things that change the current pointer we'll be seeing at the end of the group. This may not always be the most efficient mechanism -- i.e., we could find ourselves repeating later matches we've already tried, but at least it shouldn't infinite loop. But rather than trying to explain it all and debate here whether it'll work or not, it's probably quicker for me to just implement that section of code and let our tests tell the tale. I'll do that today/tomorrow and report back on the results. But if anyone knows of a case where what I've discussed isn't likely to work or has caused problems in the past, let me know so we can code up a test and/or workaround for it and see what happens. Pm