Re: An overview of the Parrot interpreter

Dan Sugalski Thu, 06 Sep 2001 09:10:44 -0700
At 06:12 PM 9/6/2001 +0200, Paolo Molaro wrote:
>On 09/05/01 Dan Sugalski wrote:
> > >It's easier to generate code for a stack machine
> >
> > So? Take a look at all the stack-based interpreters. I can name a bunch,
> > including perl. They're all slow. Some slower than others, and perl tends
> > to be the fastest of the bunch, but they're all slow.
>
>Have a look at the shootout benchmarks. Yes, we all know that
>benchmarks lie, but...
>The original mono interpreter (that didn't implement all the semantics
>required by IL code that slow down interpretation) ran about 4 times
>faster than perl/python on benchmarks dominated by branches, function calls,
>integer ops or fp ops.

Right, but mono's not an interpreter, unless I'm misunderstanding. It's a 
version of .NET, so it compiles its code before executing. And the IL it 
compiles is darned close to x86 assembly, so the conversion's close to trivial.

For this, it doesn't surprise me that Mono would wipe the floor with perl, 
since you don't have the interpreter loop and opcode dispatch overhead to 
deal with. Heck, if we *still* beat you with Mono compiling, I'd have to 
take the T over and give Miguel a hard time. :)

> > >That said, I haven't seen any evidence a register based machine is 
> going to
> > >be (significantly?) faster than a stack based one.
> > >I'm genuinely interested in finding data about that.
> >
> > At the moment a simple mix of ops takes around 26 cycles per opcode on an
> > Alpha EV6. (This is an even mix of branch, test, and integer addition
> > opcodes)  That's with everything sticking in cache, barring task switches.
> > It runs around 110 cycles/op on the reasonably antique machine I have at
> > home. (A 300MHz Celeron (the original, with no cache))
>
>Subliminal message: post the code... :-)

Anon CVS and surrounding tools (bug tracking system and such) being set up 
even as I type. (Though not by me, I don't have that many hands... :) 
Expect code to check out and build sometime early next week.

> > You're also putting far too much emphasis on registers in general. Most of
> > the work the interpreter will be doing will be elsewhere, either in the
> > opcode functions or in the variable vtable functions. The registers are
>
>That is true when executing high-level opcodes and a register or stack
>machine doesn't make any difference for that. It's not true for
>the low-level opcodes that parrot is supposed to handle according to the 
>overview
>posted by Simon.

Sure, but there'll be a mix of high and low level code. Yes, we're going to 
get hosed with low-level ops because the interpreter loop overhead. No way 
around that as long as we're an interpreter. (FWIW, the benchmark I posted 
was all low-level ops, and I'm not really unhappy with a 26 cycle/op number 
because of that. I'd like it smaller, and I think we have some ways around 
it (we can cut out the function call overhead with sufficient cleverness), 
but I'm not currently unhappy) So just because we're going to be able to 
add integers doesn't mean we're not also going to be adding full-blown 
variables. Or executing a single map or grep opcode.

The low-level ops are the places where we'll win the most when going either 
the TIL or straight compile route, since the loop and function call 
overhead will be cut out entirely.

> > It'll be faster than perl for low-level stuff because we'll have the 
> option
> > to not carry the overhead of full variables if we don't need it. It should
> > be faster than perl 5 with variables too, which will put us at the top of
> > the performance heap, possibly duking it out with Java. (Though I think
> > perl 5's faster than java now, but it's tough to get a good equivalence
> > there)
>
>Rewriting perl will leave behind all the cruft that accumulated over the 
>years,
>so it should not be difficult for parrot to run faster;-)

Boy I hope so. (Try benchmarking perl 5.004_04 against 5.6.1. I did, the 
results were embarrassing)

>Java is way faster than perl currently in many tasks:

Only when JITed. In which case you're comparing apples to oranges. A better 
comparison is against Java without JIT. (Yes, I know, Java *has* a JIT, but 
for reasonable numbers at a technical level (and yes, I also realize that 
generally speaking most folks don't care about that--they just want to know 
which runs faster) you need to compare like things)

>it will be difficult
>to beat it starting from a dynamic langauge like perl, we'll all pay
>the price to have a useful language like perl.

Unfortunately (and you made reference to this in an older mail I haven't 
answered yet) dynamic languages don't lend themselves to on-the-fly 
compilation quite the way that static languages do. Heck, they don't tend 
to lend themselves to compilation (certainly not optimization, and don't 
get me started) period as much as static languages. That's OK, it just 
means our technical challenges are similar to but not the same as for 
Java/C/C++/C#/Whatever.

> > >The only difference in the execution engine is that you need to update
> > >the stack pointer. The problem is when you need to generate code
> > >for the virtual machine.
> >
> > Codegen for register architectures is a long-solved problem. We can reach
> > back 30 or more years for it if we want. (We don't, the old stuff has been
>
>... when starting from a suitable intermediate representation (i.e., not
>machine code for another register machine).

Couple of points:

1) We're not throwing away our intermediate representations
2) Perl bytecode now (and parrot bytecode will probably follow) lives 
solidly in the MIR level now, so going to low-level representation's not a 
big deal. (We will be mostly medium to high-level ops)
3) It's a perfectly fine representation when translating to another 
register architecture, at least according to the Digital folks (one of the 
advanced compiler books was written by one of the Alpha compiler guys)
4) It's an OK representation for stack machines if you treat the registers 
as named temporaries

It really isn't that bad.

>Well, it would be up to us to design the bytecode, so I'd say it's likely 7.

Reading it again, I think it'll end up being 9, but that's not a biggie.

> > Do you really think the stack-based way will be faster?
>
>The speed of the above loop depends a lot on the actual implementation
>(the need to do a function call in the current parrot code whould blow
>away any advantage gained skipping stack updates, for example).

A stack interpreter would still have the function calls. If we were 
compiling to machine code, we'd skip the function calls for both.

>As I said in another mail, I think the stack-based approach will not
>be necessarily faster, but it will allow more optimizations down the path.

I think we're going to have to disagree on this. It's been my experience 
that a stack-based system limits your optimization possibilities. That 
limit is irrelevant on limited architectures (x86 being the big one) but 
the whole world, and our target audience, isn't living on limited machines.

>It may well be 20 % slower in some cases when interpreted, but if it allows
>me to easily JIT it and get 400 % faster, it's a non issue.

I'll have to work out a compiled version of my benchmark for comparable 
numbers. (This is sounding familiar--at TPC Miguel tried to convince me 
that .Net was the best back-end architecture to generate bytecode for) 
Until then I think comparisons against a JITted version aren't relevant.

                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk
Re: An overview of the Parrot interpreter

Reply via email to