On Fri, Dec 16, 2005 at 10:56:21PM -0600, Joshua Isom wrote: > Anyway, I've got it working uses all the regexes. I stuck to the p6 > rules, and kept the hash to print out the regex they want to see. It's > been running now for an hour now and it hasn't even reached the main > matching yet for the full benchmark. This is my first use of PGE so if > anyone has any ideas of how I can improve removing/replacing sections > of text I'm welcome to hear it.
I don't know all of the details and restrictions of the benchmark, and I'll be the first to claim that PGE can be slow at times (it has very few optimizations built-in). But we may have a few tricks available to try. First, note that <gt> is a subrule and subrules involve extra subroutine call overhead (with a lot of setup and take-down). Using C<< \> >> should be much much much faster, as it's a simple string comparison. Instead of repeatedly calling the pattern via "next", I'd just use an quantified capture and get all of the things to be stripped all at once. Thus perhaps something like: pattern = '[ ( [ \> \N*: ] \n ) | \N*: (\n) ]*' rulesub = p6rule_compile(pattern) match = rulesub(seq) This gives us a single match object, with match[0] as an array of the captured portions. We can then just walk through the captured portions (in reverse order) and remove the substrings-- something like: .local pmc capt capt = match[0] # capt is an array of Match stripfind: unless capt goto endstripfind $P0 = pop capt # remove last capture $I0 = $P0."from"() # get starting pos $I1 = $P0."to"() # get ending pos $I1 -= $I0 # convert to length substr seq, $I0, $I1, '' # remove unwanted portion goto stripfind endstripfind: Hope this helps at least a little bit. It's still likely to be somewhat slow. We may also be able to get some improvements by implementing the :g modifier for the repeated captures, and being able to compile (or use) whole substitutions as opposed to just rules. Pm