On Fri, Jul 01, 2005 at 05:46:30PM +0100, Nicholas Clark wrote: > On Fri, Jul 01, 2005 at 11:11:01AM -0500, Patrick R. Michaud wrote: > > On Fri, Jul 01, 2005 at 08:38:01AM +0100, Nicholas Clark wrote: > > > > Does this mean that you're using the same recursive approach that the > > > perl 5 > > > regular expression engine uses? (Not that I understand much of the perl 5 > > > engine, except that uses recursion to maintain parts of state) > > > > Well, I don't know if I'm using the "same" recursive approach that > > perl 5 uses, but yes PGE uses a subroutine call chain to maintain > > certain parts of its backtracking status. The only recursive call is for > > quantified group expressions of various sorts -- subpatterns and > > subrules. Outside of that, it's just a linear chain of subroutine > > Does that include simple quantifiers such as * and + ?
The thing that causes recursion is quantified groups; a quantifier on a simple atom won't do it on its own. For example, the pattern / a* \d+ / will never cause recursion, regardless of the number of a's or digits encountered. There will be at most two bsr's. However, the pattern / (a)* (\d)+ / will have some recursion in it depending on the input string, because of the need to generate and destroy the intervening match objects for each captured 'a' and digit. We may be able to eventually optimize simple cases like this to be a bit smarter about how it manages the call stack, but in the general case (with potentially nested and quantified subpatterns and rules) I think we pretty much need to have the stack around. > It's just that I'm wondering whether you're going the same way as the Perl 5 > regexp engine and hence the current implementation will share its biggest > weakness - repetitive matches against long strings blowing the stack. Well, since each rule invocation ends up with its own stack (it's a Coroutine), I'm hoping this won't be a big issue. But if it does turn out to be one, I think we'll find a way to deal with it then. :-) Pm