My weekly perusing on parrot lists...

On 03/12/04 Dan Sugalski wrote:
> For example, if you look you'll see we have 28  binary "add" ops. 
> .NET, on the other hand, only has one, and most hardware CPUs have a 

Actually, there are three opcodes: add, add.ovf, add.ovf.un (the last
two throw an exception on overflow with signed or unsigned addition:
does parrot have any way to detect oveflow?).

> few, two or three. However... for us each of those add ops has a very 
> specific, fixed, and invariant parameter list. The .NET version, on 
> the other hand, is specified to be fully general, and has to take the 
> two parameters off the stack and do whatever the right thing is, 
> regardless of whether they're platform ints, floats, objects, or a 
> mix of these. With most hardware CPUs you'll find that several bits 

Well, not really: add is specified for fp numbers, 32-bit ints, 64-bit
ints and pointer-sized ints. Addition of objects or structs is handled
by the compiler (by calling the op_Addition static method if it exists,
otherwise the operation is not defined for the types). Also, no mixing
is allowed, except between 32-bit ints ant pointer-sized ints,
conversions, if needed, need to be inserted by the compiler.

> in each parameter are dedicated to identifying the type of the 
> parameter (int constant, register number, indirect offset from a 
> register). In both cases (.NET and hardware) the engine needs to 
> figure out *at runtime* what kind of parameters its been given. 

Well, on hardware the opcodes are really different, even if it may look
like they have a major opcode and a sub-opcode specifying the type.

> the decoded form. .NET does essentially the same thing, decoding the 
> parameter types and getting specific, when it JITs the code. (And 
> runs pretty darned slowly when running without a JIT, though .NET was 
> designed to have a JIT always available)

Yes, so it doesn't matter:-) It's like saying that x86 code runs slow if
you run it in an emulator:-) It's true, but almost nobody cares
(especially since IL code can now be run with a jit on x86, ppc, sparc
and itanium - s390, arm, amd64 are in the works).

> Parrot doesn't have massive parallelism, nor are we counting on 
> having a JIT everywhere or in all circumstances. We could waste a 
> bunch of bits encoding type information in the parameters and figure 
> it all out at runtime, but... why bother? Since we *know* with 
> certainty at compile (or assemble) time what the parameter types are, 
> there's no reason to not take advantage of it. So we do.

Sure, doing things as java does, with different opcodes for different
types is entirely reasonable if you design a VM for interpretation
(though arguably there should be a limit to the combinatorial explosion
of different type arguments). There is only a marginal issue with
generics code that the IL way of doing opcodes allows and the java 
style does not, but it doesn't matter much.

> real penalty to doing it our way. It actually simplifies the JIT some 
> (no need to puzzle out the parameter types), so in that we get a win 
> over other platforms since JIT expenses are paid by the user every 
> run, while our form of decoding's only paid when you compile.

This overhead is negligible (and is completely avoided by using the
ahead of time compilation feature of mono).

> Finally, there's the big "does it matter, and to whom?" question. As 
> someone actually writing parrot assembly, it looks like parrot only 
> has one "add" op--when emitting pasm or pir you use the "add" 
> mnemonic. That it gets qualified and assembles down to one variant or 

Well, as you mention, someone has to do it and parrot needs to do it
anyway for runtime-generated parrot asm (if parrot doesn't do it already
I guess it will need to do it anyway to support features like eval etc.).
Anyway, if you're going to JIT it doesn't matter if you use one opcode
for add or one opcode for each different kind of addition. If you're
going to interpret the bytecode, having specific opcodes makes sense.

> For things like the JVM or .NET, opcodes are also bit-limited (though 
> there's much less of a real reason to do so) since they only allocate 
> a byte for their opcode number. Whether that's a good idea or not 

Don't know about the JVM, but the CLR doesn't have a single byte limit
for opcodes: two byte opcodes are already specified (and if you consider
prefix opcodes you could say there are 3 and 4 bytes opcodes already:
unaligned.volatile.cpblk is such an opcode). Also, the design allows for
any number of bytes per opcode, though I don't think that will be ever
needed: the CLR is designed to provide a fast implementation of the
low-level opcodes and to provide fast method calls: combining the two
you can implement rich semantics in a fast way without needing to change
the VM. There are still a few rough areas that could use a speedup
with specialized opcodes, but there are very few of them and 2-3
additional opcodes will fix them.

> Parrot, on the other hand, *isn't* bit-limited, since our ops are 32 
> bits. (A more efficient design on RISC systems where byte-access is 
> expensive) That opens things up a bunch.

Note that it also uses much more data cache (and disk space): this may
become relevant especially if parrot is to target embedded systems.
Anyone has done measurments on real-life code to see how much disk space
is used (data cache effects could be measured with cpu counters, but
it's much more difficult)?
For example adding two regs and storing them in a third requires 16
bytes of bytecode in parrot. The same expression takes 4 bytes in IL
code in the best case, 7 in more complex but probably more common methods. 
The maximum is 13 bytes (in the CLR operations happen on the eval stack,
so a single byte is enough, but I added the opcodes needed to load two
local vars and to store the result: you can consider the CLR a mixed
stack and register machine, but, unlike parrot, there can be as much as
65535 registers each with their own type).
Anyway, please consider this issue: I'd suggest at least to use a single
opcode_t to store the indexes to the argument and result registers for
an opcode. This would cut down the space required to 8 bytes, still
bigger than IL code, but much more comparable (unless, of course,
opcode_t is changed to be 8 bytes on some platforms...).

> functions also have a very fixed parameter list. Fixed parameter 
> list, guaranteed availability... looks like an opcode function to me. 
> So they are. We could make them library functions instead, but all 
> that'd mean would be that they'd be more expensive to call (our 
> sub/method call is a bit heavyweight) and that you'd have to do more 
> work to find and call the functions. Seemed silly.

Well, a different solution is to speedup function calls: I imagine
nobody would be against that:-)

> So, there ya go. We've either got two, a reasonable number the same 
> as pretty much everyone else, an insane number of them, or the 
> question itself is meaningless. Take your pick, they're all true. :)

An issue I think you should consider as well with the current parrot
design is this: the last time I built parrot there were 180 vtable slots
(in vtable.dump: not sure this is the actual number, but it seems
reasonable). 4 of them are because of add, for example. This means that
for each type, on a 32 bit system, at least 180*4 bytes are spent on the
vtable. How likely is it that the vtable will grow when parrot
starts getting some real use with compilers starting to target it?
For a moderately complex app that uses 500 different types that amounts
to more than 350 KB of memory already just for the vtables. Or are you
going to discourage the definition of new PMC types and to do vtable
dispatching in a different language-specific way?

Thanks.
lupus

-- 
-----------------------------------------------------------------
[EMAIL PROTECTED]                                     debian/rules
[EMAIL PROTECTED]                             Monkeys do it better

Reply via email to