On Aug 24, 2005, at 19:45, Sam Ruby wrote:


Leopold Toetsch wrote:
Sam Ruby wrote:


The return value is a callable sub.

More precisely: a curried function call.  This is an important
distinction; to see why, see below.

A callable sub may be of course a curried one - yes.

The interesting optimizations are:
- cache the find_method in the global method cache.
  This happens already, if the method string is constant.

Note that you would then be caching the results of a curried function
call.  This result depends not only on the method string, but also on
the particular object upon which it was invoked.

No the "inner" Parrot_find_method_with_cache just caches the method for a specific class (obeying C3 method resolution order as in Python). There is no curried function at that stage.


- use a PIC [1] (which is tied to the opcode) and cache
  the final resut of find_method (or any similar lookup)

Again, the results will depends on the object.

Yes. The nice thing with a PIC is that it is per bytecode location. You have a different PIC and a different cache for every call site. The prologue of PIC opcode is basically:

  if cache.version == interpreter.version:
     (cache.function)(args)
  else:
     # do dynamic lookup
     # update cache then repeat

The 'version' compare depends on the cached thingy, and is more explicit in individual implementations. But the principle remains always the same: you create a unique id that depends on the variables of the lookup and remember it. Before invoking the cached result you compare actual with cached ids. If there is a cache miss, there are 2 or 3 more cache slots to consult before doing just the dynamic original scheme again (and maybe rewrite the opcode again to just the dynamic one in case of too many cache misses).

src/pic.c and ops/pic.ops have an implementation for sub Px, Py - just for fun and profit to show the performance in the MOPS benchmark ;-)

The important part in ops/pic.ops is:

lr_types = (left->vtable->base_type << 16) | right->vtable->base_type;
    if (lru->lr_type == lr_types) {
runit_v_pp:
        ((mmd_f_v_pp)lru->f.real_function)(interpreter, left, right);

(the lru is part of the cache structure, above code will be only run, when both types are <= 0xffff)

MMD depends on the two involved types. This is compared before calling the cached function directly. As the cache is per bytecode location, there is a fair chance (>95 %) that the involved types for this very code location are matching.

The same is true for plain method calls. The callee depends on the invocant and the method name, which is usually a constant. Therefore you can compare the cached version with the actual invocant type and normally, with a match, just run the cached function immediately.

Currying is only important for placing the 'self' into the arguments - the actual lookup was already done earlier and doesn't influence the called function. Actually a curried subroutine isa 'Sub' object already and invoked directly withouth any further method lookup. There is no caching involved in the call of a curried sub.


- run the invoked Sub inside the same run loop - specifically
  not within Parrot_run_meth_fromc_args

I don't understand this.  (Note: it may not be important to this
discussion that I do understand this - all that is important to me is
that it works, and somehow I doubt that Parrot_run_meth_fromc_args cares
whether a given function is curried or not).

There are two points to be considered:
- currying: the effect is that some call arguments (in this special case the object) are already fixed. The argument passing code has therefore the duty to insert these known (and remembered) arguments into the params for the callee. For the BoundMethod this is of course, shift all arguments up by one, and make the object 'self' the first param of he sub. - the second point is only related to call speed aka optimization. It's just faster to run a PIR sub in the same run loop, then to create a new run loop.

The above works because PyInt is a constant.  It probably can be
extended to handle things that seem unlikely to change very rapidly.

Yes. That's the 'trick' behind PIC. It works best the more constant the items are. But as said above, method names and invocants usually don't vary *per byecode location*. Literature states ~95 % of method calls are monomorphic (one type of invocant), and 99,5 % are cached within 4 cache slots. Look at some typical code

   a.'foo'(x)
   ...
   b.'bar'(y)

Both method calls have a distinct cache. The method names are constant. Therefore the callee depends only on the invocant. The types of 'a' or 'b' are typically the same (except maybe inside compilers AST visit methods or some such). The same schme applies to plain method or attribute lookups.

But the combination of decisions on how to handle the passing of the
"self" parameter to a method, keeping find_method and invoke separated
at the VTABLE level, and the semantics of Python make the notion of
caching the results of find_method problematic.

Hmm. Ad 'self': this matches exactly Python or Perl calling conventions. As you have shown in your example you can call a Python "method" (which actually is just a function) with method or function call syntax.

  o.foo(x)
  foo(o, x)

are basically the same, when it comes to argument passing (I don't talk about locating 'foo' here).

This is exactly what is implemented now in Parrot. I think this is an improvement over the old state.

A distinct find_method_then_invoke vtable is a replica of the callmethodcc opcode and more a matter of optimization then anything else, because you can already now change the behavior of both parts.

- Sam Ruby

leo

Reply via email to