Dan Sugalski <[EMAIL PROTECTED]> wrote:

> Anyway, so much for the 'outside world' view of objects as black box
> things that have properties and methods.

[ ... ]

> Almost everything we do here is going to be with method calls.
> There's very little that I can see that requires any faster access
> than that, so as long as we have a proper protocol that everyone can
> conform to, we should be OK.

The distinction object vs class is still in the way a bit, when it comes
to method calls.

$ python
...
>>> i = 4
>>> i.__sub__(1)
3

>>> help(int)
...
 |  __sub__(...)
 |      x.__sub__(y) <==> x-y

Given that a PyInt is just a PMC it currently needs a lot of ugly hacks
to implement the full set of operator methods.  dynclasses/py*.pmc is
mostly just duplicating existing code from standard PMCs.

Iff the assembler just emits a method call for the opcode:

  sub P0, P1, P2

we have that equivalence too, with proper inheritance and static and dynamic
operator overloading.

I've duplicated FixedPMCArray.get_pmc_keyed_int() as a method, starting
with this line:

    METHOD PMC* __getitem__(INTVAL key) {

and measured 5 M array lookups:

$ parrot -C vt-bench.pasm

Vtable get_pmc_keyed_int 0.32 s
__getitem__              0.36 s

The actual code in the loop is:

lp2:
        callmethodcc "__getitem__"
        inc I16
        lt I16, 5000000, lp2

Our internal name for this method is "__get_pmc_keyed_int" but that
doesn't really matter. Either the Python translator emits a proper name
or the method gets aliased at runtime.

The timing from above is again done with the CGP core and the method is
cached in the PIC (polymorphic inline cache).
This is the relevant part of the executed opcode:

        typedef PMC* (*func_pi)(Interp*, PMC*, INTVAL);
        func_pi f;

        pic = (Parrot_PIC *) cur_opcode[1];
        idx = REG_INT(5);
        self = REG_PMC(2);
        if (self->vtable->base_type == pic->lru[0].lr_type) {
            f = (func_pi)pic->lru[0].f.real_function;
            REG_PMC(5) = (f)(interpreter, self, idx);
            goto *((void*)*(cur_opcode += 2));
        }

The PIC has three cache slots. Literature states that the fast path is
hit for 94% of the cases. If the object of that method call at that
bytecode location changes, another 5% is executed in a second C<if>
block. If the type match fails or for the first time VTABLE_find_method
is called to get the function pointer.

When the find_method returns a subroutine object a different opcode is
executed that calls the overloaded method. Overloaded methods are
*not* executed in a secondary runloop anymore. This improves performance
by about 50%.

I've already shown that these scheme also nicely fits for function-like
opcodes (math.sin() and such). I think it's worth some more
consideration.

leo

Reply via email to