On Aug 24, 2005, at 19:45, Sam Ruby wrote:
Leopold Toetsch wrote:
Sam Ruby wrote:
The return value is a callable sub.
More precisely: a curried function call. This is an important
distinction; to see why, see below.
A callable sub may be of course a curried one - yes.
The interesting optimizations are:
- cache the find_method in the global method cache.
This happens already, if the method string is constant.
Note that you would then be caching the results of a curried function
call. This result depends not only on the method string, but also on
the particular object upon which it was invoked.
No the "inner" Parrot_find_method_with_cache just caches the method for
a specific class (obeying C3 method resolution order as in Python).
There is no curried function at that stage.
- use a PIC [1] (which is tied to the opcode) and cache
the final resut of find_method (or any similar lookup)
Again, the results will depends on the object.
Yes. The nice thing with a PIC is that it is per bytecode location. You
have a different PIC and a different cache for every call site. The
prologue of PIC opcode is basically:
if cache.version == interpreter.version:
(cache.function)(args)
else:
# do dynamic lookup
# update cache then repeat
The 'version' compare depends on the cached thingy, and is more
explicit in individual implementations. But the principle remains
always the same: you create a unique id that depends on the variables
of the lookup and remember it. Before invoking the cached result you
compare actual with cached ids. If there is a cache miss, there are 2
or 3 more cache slots to consult before doing just the dynamic original
scheme again (and maybe rewrite the opcode again to just the dynamic
one in case of too many cache misses).
src/pic.c and ops/pic.ops have an implementation for sub Px, Py - just
for fun and profit to show the performance in the MOPS benchmark ;-)
The important part in ops/pic.ops is:
lr_types = (left->vtable->base_type << 16) |
right->vtable->base_type;
if (lru->lr_type == lr_types) {
runit_v_pp:
((mmd_f_v_pp)lru->f.real_function)(interpreter, left, right);
(the lru is part of the cache structure, above code will be only run,
when both types are <= 0xffff)
MMD depends on the two involved types. This is compared before calling
the cached function directly. As the cache is per bytecode location,
there is a fair chance (>95 %) that the involved types for this very
code location are matching.
The same is true for plain method calls. The callee depends on the
invocant and the method name, which is usually a constant. Therefore
you can compare the cached version with the actual invocant type and
normally, with a match, just run the cached function immediately.
Currying is only important for placing the 'self' into the arguments -
the actual lookup was already done earlier and doesn't influence the
called function. Actually a curried subroutine isa 'Sub' object already
and invoked directly withouth any further method lookup. There is no
caching involved in the call of a curried sub.
- run the invoked Sub inside the same run loop - specifically
not within Parrot_run_meth_fromc_args
I don't understand this. (Note: it may not be important to this
discussion that I do understand this - all that is important to me is
that it works, and somehow I doubt that Parrot_run_meth_fromc_args
cares
whether a given function is curried or not).
There are two points to be considered:
- currying: the effect is that some call arguments (in this special
case the object) are already fixed. The argument passing code has
therefore the duty to insert these known (and remembered) arguments
into the params for the callee. For the BoundMethod this is of course,
shift all arguments up by one, and make the object 'self' the first
param of he sub.
- the second point is only related to call speed aka optimization. It's
just faster to run a PIR sub in the same run loop, then to create a new
run loop.
The above works because PyInt is a constant. It probably can be
extended to handle things that seem unlikely to change very rapidly.
Yes. That's the 'trick' behind PIC. It works best the more constant the
items are. But as said above, method names and invocants usually don't
vary *per byecode location*. Literature states ~95 % of method calls
are monomorphic (one type of invocant), and 99,5 % are cached within 4
cache slots. Look at some typical code
a.'foo'(x)
...
b.'bar'(y)
Both method calls have a distinct cache. The method names are constant.
Therefore the callee depends only on the invocant. The types of 'a' or
'b' are typically the same (except maybe inside compilers AST visit
methods or some such). The same schme applies to plain method or
attribute lookups.
But the combination of decisions on how to handle the passing of the
"self" parameter to a method, keeping find_method and invoke separated
at the VTABLE level, and the semantics of Python make the notion of
caching the results of find_method problematic.
Hmm. Ad 'self': this matches exactly Python or Perl calling
conventions. As you have shown in your example you can call a Python
"method" (which actually is just a function) with method or function
call syntax.
o.foo(x)
foo(o, x)
are basically the same, when it comes to argument passing (I don't talk
about locating 'foo' here).
This is exactly what is implemented now in Parrot. I think this is an
improvement over the old state.
A distinct find_method_then_invoke vtable is a replica of the
callmethodcc opcode and more a matter of optimization then anything
else, because you can already now change the behavior of both parts.
- Sam Ruby
leo