Hello everybody, I've sometimes the task to analyse a string starting from a given position, where this position changes after each iteration. (like index() does)
As this is perl there are MTOWTDIIP but I'd like to know the fastest. So I used Benchmark.pm to find that out. (script attached) Excerpt from script: "from_start" => sub { m/\S*\s+(\S+)/; }, "re_dyn" => sub { m/^[\x00-\xff]{$pos}\S*\s+(\S+)/; }, "re_once" => sub { m/^[\x00-\xff]{$pos}\S*\s+(\S+)/o; }, "substr" => sub { substr($_,$pos) =~ m/\S*\s+(\S+)/; }, "substr_set" => sub { $tmp=substr($_,$pos); $tmp =~ m/\S*\s+(\S+)/; }, from_start is for comparision only as it should be. re_once is for comparision too as the index can't be adjusted. (and dynamically recompiling via eval() for changing indexes can't be fast enough) Results: 2505792 bytes to do ... Benchmark: timing 1000000 iterations of from_start, re_dyn, re_once, substr, substr_set... from_start: 1 wallclock secs ( 1.26 usr + -0.01 sys = 1.25 CPU) @ 800000.00/s (n=1000000) re_dyn: 9 wallclock secs ( 6.52 usr + 0.00 sys = 6.52 CPU) @ 153374.23/s (n=1000000) re_once: 1 wallclock secs ( 1.26 usr + 0.01 sys = 1.27 CPU) @ 787401.57/s (n=1000000) substr: 4 wallclock secs ( 2.36 usr + 0.02 sys = 2.38 CPU) @ 420168.07/s (n=1000000) substr_set: 5 wallclock secs ( 3.23 usr + 0.00 sys = 3.23 CPU) @ 309597.52/s (n=1000000) Rate re_dyn substr_set substr re_once from_start re_dyn 153374/s -- -50% -63% -81% -81% substr_set 309598/s 102% -- -26% -61% -61% substr 420168/s 174% 36% -- -47% -47% re_once 787402/s 413% 154% 87% -- -2% from_start 800000/s 422% 158% 90% 2% -- So: every possibility is *much* slower than necessary! So I propose (I know that I'm a bit late, but who cares ... :-) a new option for regexes (like each, case-insensitive, and match- multiple-times) which allows to specify a position to start matching. That should be *no* overhead! eg: $text.m:from500:i /\s*(\S+)/; Currently the substr() is the fastest available option - unless somebody has more imagination than me (which I take as given). So, is there a faster possibility, is that no problem for perl6, or will something like this be implemented? Regards, Phil
#!/usr/bin/perl use Benchmark qw(cmpthese); $pos=500; $runs=1000000; $_=`cat /etc/* 2> /dev/null`; study $_; print length($_), " bytes to do ...\n"; cmpthese($runs, { "from_start" => sub { m/\S*\s+(\S+)/; }, "re_dyn" => sub { m/^[\x00-\xff]{$pos}\S*\s+(\S+)/; }, "re_once" => sub { m/^[\x00-\xff]{$pos}\S*\s+(\S+)/o; }, "substr" => sub { substr($_,$pos) =~ m/\S*\s+(\S+)/; }, "substr_set" => sub { $tmp=substr($_,$pos); $tmp =~ m/\S*\s+(\S+)/; }, } );