Re: How to find regex at specific location on line

Jay Sun, 23 Jan 2005 16:41:08 -0800

On Sat, 22 Jan 2005 13:34:54 -0800, John W. Krahn <[EMAIL PROTECTED]> wrote:


> 
> Yes, two ways that I can think of:
> 
>          if ( substr( $_, 30, 3 ) =~ /\|[BNPG]\|/ ) {
> 
>          if ( /^.{30}\|[BNPG]\|/ ) {
> 
> John
> --


For the sake of comparison, here is a set of benckmarks for a couple
of different variations on the theme, including regexing the entire
line for the pattern, regexing the pattern at a known location, and
some substring and substring/regex combos.  for the heck of it, I also
tried just looking for one character, rather than a class:

CODE:

#!/usr/bin/perl

use strict ;
use warnings ;
use Benchmark ;


my $string = '| B  B  B  B |13145551212 B  B  B  B  |N| B  B  B  B  B 
B | B  0|001001|001001| 100|10|B|A|' ;

print "\n\nMultiple targets:\n\n" ;

timethese(1000000,{
    'substr_regex_1' => sub { if ( substr( $string, 39, 1 ) =~
/[BNPG]/ ) { my $a = $_ } } ,
    'substr_regex_3' => sub { if ( substr( $string, 38, 3 ) =~
/\|[BNPG]\|/ ) { my $a = $_ } } ,
    'plain_substr'   => sub { my $sub = substr( $string, 39, 1 ); if
($sub eq "B" || $sub eq "N" || $sub eq "P" || $sub eq "G") { my $a =
$sub } } ,
    'plain_regex'    => sub { if ( $string =~ /^.{38}\|[BNPG]\|/ ) {
my $a = $_ } },
    'regex_line'     => sub { if ( $string =~ /\|[BNPG]\|/ ) { my $a = $_ } }
    }) ;

print "\n\nSingle target:\n\n" ;

timethese(1000000,{
    'substr_regex_1' => sub { if ( substr( $string, 39, 1 ) =~ /N/ ) {
my $a = $_ } } ,
    'substr_regex_3' => sub { if ( substr( $string, 38, 3 ) =~ /\|N\|/
) { my $a = $_ } } ,
    'plain_substr'   => sub { if ( substr( $string, 39, 1 ) eq "N" ) {
my $a = $_ } } ,
    'plain_regex'    => sub { if ( $string =~ /^.{38}\|N\|/ ) { my $a = $_ } },
    'regex_line'     => sub { if ( $string =~ /\|N\|/ ) { my $a = $_ } }
    }) ;

RESULTS (representative):

Multiple targets:

Benchmark: timing 1000000 iterations of plain_regex, plain_substr,
regex_line, substr_regex_1, substr_regex_3...
plain_regex:  2 wallclock secs ( 2.69 usr +  0.03 sys =  2.72 CPU) @
367647.06/s (n=1000000)
plain_substr:  3 wallclock secs ( 3.16 usr +  0.02 sys =  3.18 CPU) @
314465.41/s (n=1000000)
regex_line:  3 wallclock secs ( 3.32 usr +  0.00 sys =  3.32 CPU) @
301204.82/s (n=1000000)
substr_regex_1:  3 wallclock secs ( 2.30 usr +  0.05 sys =  2.35 CPU)
@ 425531.91/s (n=1000000)
substr_regex_3:  3 wallclock secs ( 2.81 usr +  0.07 sys =  2.88 CPU)
@ 347222.22/s (n=1000000)


Single target:

Benchmark: timing 1000000 iterations of plain_regex, plain_substr,
regex_line, substr_regex_1, substr_regex_3...
plain_regex:  6 wallclock secs ( 3.02 usr +  0.04 sys =  3.06 CPU) @
326797.39/s (n=1000000)
plain_substr:  3 wallclock secs ( 1.78 usr +  0.02 sys =  1.80 CPU) @
555555.56/s (n=1000000)
regex_line:  1 wallclock secs ( 1.54 usr +  0.03 sys =  1.57 CPU) @
636942.68/s (n=1000000)
substr_regex_1:  2 wallclock secs ( 2.20 usr +  0.02 sys =  2.22 CPU)
@ 450450.45/s (n=1000000)
substr_regex_3:  3 wallclock secs ( 2.19 usr +  0.00 sys =  2.19 CPU)
@ 456621.00/s (n=1000000)



What was interesting to me was that although, predictably, the
substring/regex combo was consistently the best performer for the
original match, regexing the whole line was consistently the best
performer when looking for "|N|".  This seems to fly in the face of
the conventional wisdom that substr is faster than m// when you know
what you're looking for and where you're looking for it.

--jay

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: How to find regex at specific location on line

Reply via email to