Alan Campbell schreef:

> I'm slurping in a large file and seeing a nice speedup
> versus line by line processing...

I would go back to line-by-line again if the regexp should only match
within one line.


>    # look for potentially problematic code of the following form: -
>    #  STW b0, *SP--[3]
>    # The reg exp tries to match: -
>    # - anything up until 'ST' (so that we match STH, STW, STDW etc)
>    followed by # - 1+ non-whitespace chars followed by
>    # - 0+ whitespace chars followed by
>    # - 0+ non-whitespace chars followed by
>    # the string 'B15--' followed by
>    # anything up until an odd single-digit number followed by
>    # the ']' character
>    # Matches all occurrences
>    #
>    my @match_sp = $all_lines =~ /.*ST\S+\s*\S*B15--.*[^02468]]/mg;

The start isn't specific, make that "^\ *ST"
The next, \S+ could also be [A-Z]+.
The next, \s*, I would make "\ *".
The next , \S*, is ok. You remain in the same line.
The ".*" also remains in the same line, so I don't understand the
/m-modifier.

Is this meant as multi-line, or should the B15 be on the same line as
the STW?
Why is the b0 in lowercase, and the B15 in uppercase?
Is the "]" supposed to be in the charset? I assume no. Otherwise use
"[^]02468]".

   while ( <> )
   {
     / ^ \ * ST[A-Z]+ \ * \S*,\S* B15-- .* [^02468] \] /xi
     and push @match_sp, $_;
   }


You could also run a grep on the large file first:

  $ grep -i "ST.*B15--" infile > tmpfile

and then run your Perl program with tmpfile.

-- 
Affijn, Ruud

"Gewoon is een tijger."



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to