Steve Fink: # On Mon, Jan 14, 2002 at 01:49:44AM -0800, Brent Dax wrote: # > I wrote a _very_ simple benchmark program to compare Perl 5 # and Parrot. # > Here's the result of a test run on my machine: # > # > C:\brent\Visual Studio Projects\Perl 6\parrot\parrot>..\benchmark # > Benchmarking "bbcdefg" =~ /b[cde]*.f/... # > perl: 0.03000 seconds for 10_000 iters # > parrot: 0.24100 seconds for 10_000 iters # > Best: perl, worst: parrot. Spread of 0.21100. # > # > The program is attached; it requires my latest regex patch # to work. You # > may need to tr{\\}{/} in a few places to get it to work on # Unix systems. # # Are you compiling with optimization? I have my own implementation I've # been toying with, and the first time I benchmarked it, it was pretty # much identical to yours (a little surprising, considering I was # benchmarking a totally different expression!) Then I noticed that I # had compiled it without optimization and tried again with -O3, and the # gap narrowed significantly.
I tried it once and did see the gap narrow some, but I keep forgetting to re-enable it as I modify things and rebuild. BTW, it's probably better to use -O, which will let the compiler choose the best optimization level. -O3 forces it to optimize to level 3 or give up completely. # With mine, I am currently seeing: # # Benchmarking "xxabbbxx" =~ /ab+a*b/... # perl: 1.20323 seconds for 500_000 iters # parrot: 2.87138 seconds for 500_000 iters # # Mine doesn't yet handle character classes, so I can't do a direct # comparison. What a shame. Character classes are the funnest part of it! ;^) (To be fair, rx_oneof sat empty for a very long time. It's the hardest matching op to implement.) # If you want, you can send me an rx.ops implementation of # /ab+a*b/ and I'll report the timings of all three. (This isn't a very # fair benchmark, though, because perl5's optimizations come into play # with this one, and neither of our engines has a "scan for exact # string" op that would let us emulate the optimized expression.) Once Parrot gets an index() op based on a fast string search algorithm, that will become a non-issue. Also, I seem to remember that somebody was at least trying to figure out what would be necessary to disable regex optimizations in Perl 5. Untested implementation of {"xxabbb"=~/ab+a*b/ for(I0=500_000; I0; I0--)}: set I0, 500000 set S0, "xxabbbxx" rx_allocinfo P0, S0 time N0 print N0 $top: bsr RX_0 rx_clearinfo P0, S0 dec I0 if I0, $top time N0 print N0 rx_freeinfo P0 RX_0: rx_setprops "", 3 branch $start $advance: rx_advance P0, $fail $start: rx_literal P0, "ab", $advance rx_pushmark P0 $top1: rx_literal P0, "b", $next1 rx_pushindex P0 branch $top1 $back1: rx_popindex P0, $advance $next1: rx_pushmark P0 $top2: rx_literal P0, "a", $next2 rx_pushindex P0 branch $top2 $back2: rx_popindex P0, $back1 $next2: rx_literal P0, "b", $back2 rx_success P0 ret $fail: rx_fail P0 ret # I notice that string_ord() is taking up a pretty big chunk of time. # Which isn't too surprising, considering that string_index() is # # return # s->encoding->decode(s->encoding->skip_forward(s->bufstart, idx)) It could also be that we're calling it so damn much. Even a function that's just {return a+b;} will take a lot of time if it's called eleventy jillion times. # which is more levels of indirection than you can shake a stick at. And # that makes me wonder if we can ever compete fairly with perl5 without # implementing a binary buffer matching mode. Seems like we're always # paying a penalty for doing "proper" string matching by going through # all these levels of encoding. I really don't want to start mucking around in string internals. OTOH, I'm planning on forcing everything to utf32 Normalization Form KC, so it may not be too big of a problem. # My RE engine is still pretty rudimentary, but I'll mail a patch to # anyone who wants to take a look at it. The core really isn't much # different from Brent's rx stuff; I think his is slightly more # explicit. The internal wiring is likely to be rather different, # though. Send me a copy. There's sure to be at least a few things in it that are better implemented, if not the whole thing. --Brent Dax [EMAIL PROTECTED] Parrot Configure pumpking and regex hacker <obra> mmmm. hawt sysadmin chx0rs <lathos> This is sad. I know of *a* hawt sysamin chx0r. <obra> I know more than a few. <lathos> obra: There are two? Are you sure it's not the same one?