Well, I'm beginning to feel it's habitual for me to periodically pop my
head in and waffle on Parrot's core sizes.  But waffle I shall. 

- opcode_t

This has already been discussed, so I'll sum up.  To remain compatible
(and efficient) across the spectrum of 32-bit and 64-bit platforms, the
value of opcode_t is limited to 32-bits.  (Or, more accurately, 31
bits.)  Although you could do larger on a 64-bit platform, the use of
opcode_t as an array index and memory offset limits it to the size of
the addressable memory anyway.  (So the value would be downcast by the
end, if not before.  I can't find a reference to what integer type an
array index is.)

Not to mention all the *other* problems we'll have if we've got more
than 2^31 different opcodes.  (Although that's why there's UUIDs now,
isn't there?)

Although Parrot needs to be able to convert 32-bit and 64-bit wide
opcodes, there's no reason to process at anything other than native
(size_t-ish) size, since a good 90%+ of the uses will be cast that size
anyway.

- INTVAL

Early on, I was a big fan of making INTVALs as big as you could.  Bitten
by integer rollover, watching the struggles of complete 64-bit Int
support in Perl 5, huge INTVALs were important to me.

As Parrot has evolved, I've come to realize that what I *really* want is
to be able to program with huge INTVALs.  Which isn't the same thing.

                       -----------
     ------------------| Opcodes | <---  Program
S    | Interpreter     -----------
Y    |                    ^  |
S <- |  G      R    <-----|  |
T -> |  u  <-  e          v  |
E    |  t  ->  g       --------
M    |  s      s       | PMCs |
     -------------------------- 

So when I write a program, there are going to be two types of numbers,
user and system.  (For lack of imagination.)

User numbers, of course, are the numbers that exist for their own
purpose, and for the user's benefit.

    $a = 5;
    $b = $a * 2 + 6;

System numbers are those marked "internal use only".  File numbers,
array indices, counters, the language infrastructure.  These bubble down
to the guts of the interpreter, and eventually to the system.  If INTVAL
is greater than the natural system width, conversion is in order.

(For the sake of using real numbers, I'll assume 32/64.)

Currently, the flow is, in variable sizes:

    Opcodes: 32 (constants are limited by the spec)
    PMCs   : 64
    Regs   : 64
    Guts   : 64/32 mix
    System : 32

What's troublesome is the rash of conversions between the system and
some guts, those guts and other guts, or those guts and registers.  
(Besides the extra cost of schlepping around the extra data, size
differentials between INTVALs and pointers (which is problematic to
begin with), unchecked truncation, and the added burden on the JIT, it's
not really a problem.)

And for what?  To be able to add large numbers?

Numbers, as a type in a language that rides upon Parrot, never really
reach beyond the boundaries of the PMCs themselves.  The majority of
numerics passed down through the registers are destined for conversion
anyway.

The flow *really* is, in value sizes: 

    Opcodes: 32 (constants are limited by the spec)
    PMCs   : 64
    Regs   : 32 
    Guts   : 32
    System : 32

Certainly, much like the physical machine the virtual machine runs on,
it needs to support, or at least not preclude, wider numeric types for
access by languages.  But given the mapping of the bulk of the virtual
on the machine onto the physical, that should probably be relegated to
just support.

For example, take Perl 5's struggle for maximal bitness.  Given that
Perl 6 will continue in that direction - and further, if you consider
auto-promotion to arbitrarily sized numbers - and the language will
provide all of the functionality within its PMCs, why does it need the
interpreter to do any more than not get in its way?  (Consider, for a
moment, that bytecode strives to be portable across all Parrot virtual
machines, which implies that nothing in the bytecode, nor in the
supporting languages, should be dependent on Parrot being configured
with extended integers in the first place.)

On the off chance that a language with extended numerics wants to use
registers, what would the feasibility be (from the JIT, compiler, etc)
to borrow a page from the physical hardware and simply join two smaller
registers together?  (The advantage of contiguous memory regions.)

- FLOATVAL

The same principle, with a twist.  Like most operating systems, the
interpreter doesn't really have a need - in and of itself - for floating
point.  Floating points pretty much exist entirely for end
calculations.  So there's much less internal data flow of floats and
needless conversions.  But there's also much less need for the
interpreter itself to have to have configurable sized floats.  But then
there's little reason not to have configurable sized floats.  The JIT, I
guess.

- Problems

Well, Parrot's had problems from the beginning with non-"long, double,
long" configurations.  By keeping INTVAL and FLOATVAL as the maximum
size supported (basically either "long" or "long long", or "double" or
"long double"), languages can feel free to take advantage of what
facilities are available to them, if they so choose.

But what of inter-language operability?  Will the registers become the 
crossroads for data conversions between PMCs from difference languages? 
It doesn't look that way, from the direction that PMCs have gone.  

Can we simplify interpreter types this much, while still providing
extended numerics to hosted languages?

-- 
Bryan C. Warnock
bwarnock@(gtemail.net|raba.com)

Reply via email to