On Fri, Jul 01, 2005 at 08:38:01AM +0100, Nicholas Clark wrote: > On Thu, Jun 30, 2005 at 10:44:03AM -0500, Patrick R. Michaud wrote: > > If it would help for me to give more details about the bsr/ret scheme > > I'm using, I'll be glad to post it. I could certainly give a Perl 6 > > equivalent of the rule we're looking at. But essentially the key is > > that a "bsr" always represents a backtracking point, a "ret" always > > indicates fail, a "goto R*" is a tailcall, and the named registers > > are variables in the scope of all of the R* subroutines. > > Does this mean that you're using the same recursive approach that the perl 5 > regular expression engine uses? (Not that I understand much of the perl 5 > engine, except that uses recursion to maintain parts of state)
Well, I don't know if I'm using the "same" recursive approach that perl 5 uses, but yes PGE uses a subroutine call chain to maintain certain parts of its backtracking status. The only recursive call is for quantified group expressions of various sorts -- subpatterns and subrules. Outside of that, it's just a linear chain of subroutine calls (no recursion), and even then subroutine calls are only invoked where some sort of range quantification is needed to keep backtracking points. And the subroutine calls used are of the bsr/ret variety, as opposed to using the continuation passing style calls. For example, an expression like: /^ab c**{3} \d+ xyz$/ will have a maximum bsr/ret call depth of 1, to handle the \d+. Something like: /^(abc)**{3} xyz$/ will have a maximum bsr/ret call depth of 4, to handle the quantification of the group and match object. (Groups require an extra initial bsr call to initialize the group.) And something like /^( [abc | def]**{3} )* xyz$/ can end up with a lot of bsr/rets because of the nested and quantified groups. I certainly won't claim this is the best way to do it, but for the time being it seemed easier and more straightforward to just use bsr/ret to maintain state than to build a separate chain to avoid using the execution stack. Each pattern gets its own Coroutine and its own copy of the stacks anyway, and the option to take an entirely different approach is still available to us if/when we need it -- we'll just change the codegen. (We can even have/use multiple codegen algorithms and let an optimizer pick the best one.) Pm