Steve Fink:
# On Mon, Jan 14, 2002 at 01:49:44AM -0800, Brent Dax wrote:
# > I wrote a _very_ simple benchmark program to compare Perl 5
# and Parrot.
# > Here's the result of a test run on my machine:
# >
# > C:\brent\Visual Studio Projects\Perl 6\parrot\parrot>..\benchmark
# > Benchmarking "bbcdefg" =~ /b[cde]*.f/...
# >      perl: 0.03000 seconds for 10_000 iters
# >    parrot: 0.24100 seconds for 10_000 iters
# > Best: perl, worst: parrot. Spread of 0.21100.
# >
# > The program is attached; it requires my latest regex patch
# to work.  You
# > may need to tr{\\}{/} in a few places to get it to work on
# Unix systems.
#
# Are you compiling with optimization? I have my own implementation I've
# been toying with, and the first time I benchmarked it, it was pretty
# much identical to yours (a little surprising, considering I was
# benchmarking a totally different expression!) Then I noticed that I
# had compiled it without optimization and tried again with -O3, and the
# gap narrowed significantly.

I tried it once and did see the gap narrow some, but I keep forgetting
to re-enable it as I modify things and rebuild.  BTW, it's probably
better to use -O, which will let the compiler choose the best
optimization level.  -O3 forces it to optimize to level 3 or give up
completely.

# With mine, I am currently seeing:
#
# Benchmarking "xxabbbxx" =~ /ab+a*b/...
#      perl: 1.20323 seconds for 500_000 iters
#    parrot: 2.87138 seconds for 500_000 iters
#
# Mine doesn't yet handle character classes, so I can't do a direct
# comparison.

What a shame.  Character classes are the funnest part of it!  ;^)

(To be fair, rx_oneof sat empty for a very long time.  It's the hardest
matching op to implement.)

#             If you want, you can send me an rx.ops implementation of
# /ab+a*b/ and I'll report the timings of all three. (This isn't a very
# fair benchmark, though, because perl5's optimizations come into play
# with this one, and neither of our engines has a "scan for exact
# string" op that would let us emulate the optimized expression.)

Once Parrot gets an index() op based on a fast string search algorithm,
that will become a non-issue.  Also, I seem to remember that somebody
was at least trying to figure out what would be necessary to disable
regex optimizations in Perl 5.

Untested implementation of {"xxabbb"=~/ab+a*b/ for(I0=500_000; I0;
I0--)}:

                set I0, 500000
                set S0, "xxabbbxx"
                rx_allocinfo P0, S0
                time N0
                print N0
        $top:
                bsr RX_0
                rx_clearinfo P0, S0
                dec I0
                if I0, $top

                time N0
                print N0
                rx_freeinfo P0

        RX_0:
                rx_setprops "", 3
                branch $start
        $advance:
                rx_advance P0, $fail
        $start:
                rx_literal P0, "ab", $advance
                rx_pushmark P0
        $top1:
                rx_literal P0, "b", $next1
                rx_pushindex P0
                branch $top1
        $back1:
                rx_popindex P0, $advance
        $next1:
                rx_pushmark P0
        $top2:
                rx_literal P0, "a", $next2
                rx_pushindex P0
                branch $top2
        $back2:
                rx_popindex P0, $back1
        $next2:
                rx_literal P0, "b", $back2
                rx_success P0
                ret
        $fail:
                rx_fail P0
                ret

# I notice that string_ord() is taking up a pretty big chunk of time.
# Which isn't too surprising, considering that string_index() is
#
# return
# s->encoding->decode(s->encoding->skip_forward(s->bufstart, idx))

It could also be that we're calling it so damn much.  Even a function
that's just {return a+b;} will take a lot of time if it's called
eleventy jillion times.

# which is more levels of indirection than you can shake a stick at. And
# that makes me wonder if we can ever compete fairly with perl5 without
# implementing a binary buffer matching mode. Seems like we're always
# paying a penalty for doing "proper" string matching by going through
# all these levels of encoding.

I really don't want to start mucking around in string internals.  OTOH,
I'm planning on forcing everything to utf32 Normalization Form KC, so it
may not be too big of a problem.

# My RE engine is still pretty rudimentary, but I'll mail a patch to
# anyone who wants to take a look at it. The core really isn't much
# different from Brent's rx stuff; I think his is slightly more
# explicit. The internal wiring is likely to be rather different,
# though.

Send me a copy.  There's sure to be at least a few things in it that are
better implemented, if not the whole thing.

--Brent Dax
[EMAIL PROTECTED]
Parrot Configure pumpking and regex hacker

<obra> mmmm. hawt sysadmin chx0rs
<lathos> This is sad. I know of *a* hawt sysamin chx0r.
<obra> I know more than a few.
<lathos> obra: There are two? Are you sure it's not the same one?

Reply via email to