On Fri, Jul 01, 2005 at 08:38:01AM +0100, Nicholas Clark wrote:
> On Thu, Jun 30, 2005 at 10:44:03AM -0500, Patrick R. Michaud wrote:
> > If it would help for me to give more details about the bsr/ret scheme
> > I'm using, I'll be glad to post it.  I could certainly give a Perl 6
> > equivalent of the rule we're looking at.  But essentially the key is 
> > that a "bsr" always represents a backtracking point, a "ret" always
> > indicates fail, a "goto R*" is a tailcall, and the named registers 
> > are variables in the scope of all of the R* subroutines.
> 
> Does this mean that you're using the same recursive approach that the perl 5
> regular expression engine uses? (Not that I understand much of the perl 5
> engine, except that uses recursion to maintain parts of state)

Well, I don't know if I'm using the "same" recursive approach that
perl 5 uses, but yes PGE uses a subroutine call chain to maintain
certain parts of its backtracking status.  The only recursive call is for
quantified group expressions of various sorts -- subpatterns and
subrules.  Outside of that, it's just a linear chain of subroutine
calls (no recursion), and even then subroutine calls are only invoked
where some sort of range quantification is needed to keep backtracking
points.  And the subroutine calls used are of the bsr/ret variety,
as opposed to using the continuation passing style calls.

For example, an expression like:

    /^ab c**{3} \d+ xyz$/

will have a maximum bsr/ret call depth of 1, to handle the \d+.

Something like:

    /^(abc)**{3} xyz$/

will have a maximum bsr/ret call depth of 4, to handle the
quantification of the group and match object.  (Groups require an
extra initial bsr call to initialize the group.)

And something like

    /^( [abc | def]**{3} )* xyz$/

can end up with a lot of bsr/rets because of the nested and quantified
groups.  

I certainly won't claim this is the best way to do it, but for the
time being it seemed easier and more straightforward to just use bsr/ret
to maintain state than to build a separate chain to avoid using the
execution stack.  Each pattern gets its own Coroutine and its own
copy of the stacks anyway, and the option to take an entirely different
approach is still available to us if/when we need it -- we'll just
change the codegen.  (We can even have/use multiple codegen algorithms 
and let an optimizer pick the best one.)

Pm

Reply via email to