On Fri, Jul 01, 2005 at 05:46:30PM +0100, Nicholas Clark wrote:
> On Fri, Jul 01, 2005 at 11:11:01AM -0500, Patrick R. Michaud wrote:
> > On Fri, Jul 01, 2005 at 08:38:01AM +0100, Nicholas Clark wrote:
> 
> > > Does this mean that you're using the same recursive approach that the 
> > > perl 5
> > > regular expression engine uses? (Not that I understand much of the perl 5
> > > engine, except that uses recursion to maintain parts of state)
> > 
> > Well, I don't know if I'm using the "same" recursive approach that
> > perl 5 uses, but yes PGE uses a subroutine call chain to maintain
> > certain parts of its backtracking status.  The only recursive call is for
> > quantified group expressions of various sorts -- subpatterns and
> > subrules.  Outside of that, it's just a linear chain of subroutine
> 
> Does that include simple quantifiers such as * and + ?

The thing that causes recursion is quantified groups; a quantifier
on a simple atom won't do it on its own.  For example, the
pattern  / a* \d+ /  will never cause recursion, regardless of the
number of a's or digits encountered.  There will be at most two bsr's.

However, the pattern / (a)* (\d)+ / will have some recursion in it
depending on the input string, because of the need to generate
and destroy the intervening match objects for each captured 'a' and
digit.  We may be able to eventually optimize simple cases like 
this to be a bit smarter about how it manages the call stack, but 
in the general case (with potentially nested and quantified 
subpatterns and rules) I think we pretty much need to have the stack
around.

> It's just that I'm wondering whether you're going the same way as the Perl 5
> regexp engine and hence the current implementation will share its biggest
> weakness - repetitive matches against long strings blowing the stack.

Well, since each rule invocation ends up with its own stack 
(it's a Coroutine), I'm hoping this won't be a big issue.  But if
it does turn out to be one, I think we'll find a way to deal with
it then.  :-)

Pm

Reply via email to