On Fri, Dec 16, 2005 at 10:56:21PM -0600, Joshua Isom wrote:
> Anyway, I've got it working uses all the regexes.  I stuck to the p6 
> rules, and kept the hash to print out the regex they want to see.  It's 
> been running now for an hour now and it hasn't even reached the main 
> matching yet for the full benchmark.  This is my first use of PGE so if 
> anyone has any ideas of how I can improve removing/replacing sections 
> of text I'm welcome to hear it.  

I don't know all of the details and restrictions of the benchmark, 
and I'll be the first to claim that PGE can be slow at times (it has very
few optimizations built-in).  But we may have a few tricks available
to try.

First, note that <gt> is a subrule and subrules involve extra
subroutine call overhead (with a lot of setup and take-down).  
Using C<< \> >> should be much much much faster, as it's a simple
string comparison.

Instead of repeatedly calling the pattern via "next", I'd
just use an quantified capture and get all of the things to be
stripped all at once.  Thus perhaps something like:

    pattern = '[ ( [ \> \N*: ] \n ) | \N*: (\n) ]*'
    rulesub = p6rule_compile(pattern)
    match = rulesub(seq)

This gives us a single match object, with match[0] as an array
of the captured portions.  We can then just walk through the 
captured portions (in reverse order) and remove the substrings--
something like:

    .local pmc capt
    capt = match[0]            # capt is an array of Match
  stripfind:
    unless capt goto endstripfind
    $P0 = pop capt             # remove last capture
    $I0 = $P0."from"()         # get starting pos
    $I1 = $P0."to"()           # get ending pos
    $I1 -= $I0                 # convert to length
    substr seq, $I0, $I1, ''   # remove unwanted portion
    goto stripfind
  endstripfind:

Hope this helps at least a little bit.  It's still likely
to be somewhat slow.  We may also be able to get some improvements 
by implementing the :g modifier for the repeated captures, and 
being able to compile (or use) whole substitutions as opposed to 
just rules.

Pm

Reply via email to