Re: How to find regex at specific location on line

Jay Mon, 24 Jan 2005 08:44:16 -0800

On Mon, 24 Jan 2005 10:46:38 -0500, Dave Gray <[EMAIL PROTECTED]> wrote:
> >     'plain_regex'    => sub { if ( $string =~ /^.{38}\|[BNPG]\|/ ) {
> > my $a = $_ } },
> >     'plain_regex'    => sub { if ( $string =~ /^.{38}\|N\|/ ) { my $a = $_ 
> > } },
> >
> > What was interesting to me was that although, predictably, the
> > substring/regex combo was consistently the best performer for the
> > original match, regexing the whole line was consistently the best
> > performer when looking for "|N|".  This seems to fly in the face of
> > the conventional wisdom that substr is faster than m// when you know
> > what you're looking for and where you're looking for it.
> 
> I believe the conventional (faster?) way to do a regex search when you
> want to start matching a fixed number of characters in is along the
> lines of:
> 
> pos($string) = 38;
> print "found!\n" if $string =~ /\G\|[BNPG]\|/;
>


That's an interesting point, although in my mind, it was niether a
substr() solution nor a pure regex solution so I just kind of skipped
over it.  Performance-wise, I can't seen any advantage, unless you're
trying to modify the behavior of m//g.  Slightly revised benchmarks
are below.  Today they're from the PII/166 running SuSE 9.1 I use as a
test server at work, and strangely, substr() is considerably faster,
as generally predicted.  This is very different from last night's
results on the G4/750 under OS X.  The regex engine must optimize very
well for the PPC--either that, or Benchmark.pm compiles with errors on
one of the hardware/software combinations.  Interesting.

Benchmark: timing 1000000 iterations of plain_regex, plain_substr,
pos_regex, regex_line, substr_regex_1, substr_regex_3...
plain_regex: 41 wallclock secs (40.16 usr +  0.01 sys = 40.17 CPU) @
24894.20/s (n=1000000)
plain_substr: 20 wallclock secs (20.20 usr +  0.01 sys = 20.21 CPU) @
49480.46/s (n=1000000)
 pos_regex: 32 wallclock secs (32.24 usr +  0.01 sys = 32.25 CPU) @
31007.75/s (n=1000000)
regex_line: 40 wallclock secs (39.85 usr +  0.01 sys = 39.86 CPU) @
25087.81/s (n=1000000)
substr_regex_1: 27 wallclock secs (27.69 usr +  0.00 sys = 27.69 CPU)
@ 36114.12/s (n=1000000)
substr_regex_3: 39 wallclock secs (39.06 usr +  0.01 sys = 39.07 CPU)
@ 25595.09/s (n=1000000)

#!/usr/bin/perl

use strict ;
use warnings ;
use Benchmark ;

my $string = '| B  B  B  B |13145551212 B  B  B  B  |N| B  B  B  B  B
B | B  0|001001|001001| 100|10|B|A|' ;

print "\n\nMultiple targets:\n\n" ;

timethese(1000000,{
    'substr_regex_1' => sub { if ( substr( $string, 39, 1 ) =~
/[BNPG]/ ) { my $a = $_ } } ,
    'substr_regex_3' => sub { if ( substr( $string, 38, 3 ) =~
/\|[BNPG]\|/ ) { my $a = $_ } } ,
    'plain_substr'   => sub {
        my $sub = substr( $string, 39, 1 );
        if ($sub eq "B" || $sub eq "N" || $sub eq "P" || $sub eq "G")
{ my $a = $sub }
    },
    'plain_regex'    => sub { if ( $string =~ /.{38}\|[BNPG]\|/ ) { my
$a = $_ } },
    'regex_line'     => sub { if ( $string =~ /\|[BNPG]\|/ ) { my $a = $_ } },
    'pos_regex'      => sub {
        pos($string) = 38 ;
        if ( $string =~ /\G\|[BNPG]\|/ ) { my $a = $_ }
    }
}) ;

--jay

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: How to find regex at specific location on line

Reply via email to