On Mon, 24 Jan 2005 10:46:38 -0500, Dave Gray <[EMAIL PROTECTED]> wrote: > > 'plain_regex' => sub { if ( $string =~ /^.{38}\|[BNPG]\|/ ) { > > my $a = $_ } }, > > 'plain_regex' => sub { if ( $string =~ /^.{38}\|N\|/ ) { my $a = $_ > > } }, > > > > What was interesting to me was that although, predictably, the > > substring/regex combo was consistently the best performer for the > > original match, regexing the whole line was consistently the best > > performer when looking for "|N|". This seems to fly in the face of > > the conventional wisdom that substr is faster than m// when you know > > what you're looking for and where you're looking for it. > > I believe the conventional (faster?) way to do a regex search when you > want to start matching a fixed number of characters in is along the > lines of: > > pos($string) = 38; > print "found!\n" if $string =~ /\G\|[BNPG]\|/; >
That's an interesting point, although in my mind, it was niether a substr() solution nor a pure regex solution so I just kind of skipped over it. Performance-wise, I can't seen any advantage, unless you're trying to modify the behavior of m//g. Slightly revised benchmarks are below. Today they're from the PII/166 running SuSE 9.1 I use as a test server at work, and strangely, substr() is considerably faster, as generally predicted. This is very different from last night's results on the G4/750 under OS X. The regex engine must optimize very well for the PPC--either that, or Benchmark.pm compiles with errors on one of the hardware/software combinations. Interesting. Benchmark: timing 1000000 iterations of plain_regex, plain_substr, pos_regex, regex_line, substr_regex_1, substr_regex_3... plain_regex: 41 wallclock secs (40.16 usr + 0.01 sys = 40.17 CPU) @ 24894.20/s (n=1000000) plain_substr: 20 wallclock secs (20.20 usr + 0.01 sys = 20.21 CPU) @ 49480.46/s (n=1000000) pos_regex: 32 wallclock secs (32.24 usr + 0.01 sys = 32.25 CPU) @ 31007.75/s (n=1000000) regex_line: 40 wallclock secs (39.85 usr + 0.01 sys = 39.86 CPU) @ 25087.81/s (n=1000000) substr_regex_1: 27 wallclock secs (27.69 usr + 0.00 sys = 27.69 CPU) @ 36114.12/s (n=1000000) substr_regex_3: 39 wallclock secs (39.06 usr + 0.01 sys = 39.07 CPU) @ 25595.09/s (n=1000000) #!/usr/bin/perl use strict ; use warnings ; use Benchmark ; my $string = '| B B B B |13145551212 B B B B |N| B B B B B B | B 0|001001|001001| 100|10|B|A|' ; print "\n\nMultiple targets:\n\n" ; timethese(1000000,{ 'substr_regex_1' => sub { if ( substr( $string, 39, 1 ) =~ /[BNPG]/ ) { my $a = $_ } } , 'substr_regex_3' => sub { if ( substr( $string, 38, 3 ) =~ /\|[BNPG]\|/ ) { my $a = $_ } } , 'plain_substr' => sub { my $sub = substr( $string, 39, 1 ); if ($sub eq "B" || $sub eq "N" || $sub eq "P" || $sub eq "G") { my $a = $sub } }, 'plain_regex' => sub { if ( $string =~ /.{38}\|[BNPG]\|/ ) { my $a = $_ } }, 'regex_line' => sub { if ( $string =~ /\|[BNPG]\|/ ) { my $a = $_ } }, 'pos_regex' => sub { pos($string) = 38 ; if ( $string =~ /\G\|[BNPG]\|/ ) { my $a = $_ } } }) ; --jay -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>