Hi,

Is P1 means two literals? 'P' and '1'? PCRE has clever optimizations for 
quantifiers used on single characters: it can count them, and go back when 
recursion needed. The story is more difficult for other expessions, since a 
recursion may happened inside the subexpression.

Anyway, back references also optimized in the same way as characters, so 
perhaps:

"(P1)*" could be rewritten to "(?:(P1)\1*)?"

Regards,
Zoltan

Alan Lehotsky <[email protected]> írta:
>I have a string that's 150,002 characters consisting of>
>
        P1P2......P4....P9.....>
>
>
(Thats 'P1', followed by 50k each of 'P2', 'P4', 'P9')>
>
and a simpleminded regex of>
>
        (P1(P2)*(P4)*(P9)*)?>
>
With PCRE 7.9, this crashes in match() on the 5672'd recursive call (on a linux 
box with a moderately large stack)>
>
With the old Spencer regex, we match successfully.>
>
I tried the obvious things to improve this:>
>
1/ replaced capturing parentheses with (?: )>
>
2/ Added explicit ^$ anchors>
>
3/ possessive quantifiers.>
>
4/ atomic grouping>
>
In all cases I still stomp the stack via recursion in match().>
>
I assume that if I configured PCRE to not use the stack that I could make this 
run (although probably really slowly due to malloc/free overhead).  >
>
Any advice?>
-- >
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev >


-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to