A cache, well caches things and speeds something up, and can be considered as an optimization. OTOH, knowing that things can be reasoable fast, can open the mind for possible solutions that would otherwise be silently discarded as known to be slow.

I'm not proposing the usage of inline caching for all possible things now, I'd like to present the idea.

1) What is an inline cache

"Inline" means that the cache becomes part of the code. This scheme is usable in Parrot for prederefed runcores (Switch, CGP) and the JIT. The possibly costy lookup of some resource is done at runtime once and then the result is kept in the cache. Subsequent execution of that opcode just returns the result. Sometimes the opcode is replaced by a different one e.g. one for calling a C (NCI Sub) method or a PASM method.

All the usage of inline caching contains some constant key.

2) Monomorphic inline cache (MIC)

a) Spilling

we have now, e.g.:

    set Px, P31[2]    # fetch spilled var #3
    set P31[0], Iy,   # store spilled var #0

The constant key is here the index in the spill array. The bytecode of above is:

   set_p_p_ki x, 31, 2

during predereferencing this can be replaced by

   spill_fetch_p  x, <addr_of_elem_2_in_array_at_P31>

This case doesn't even need an additional cache structure[¹]. The address of the location can just be inlined im the opcode stream (as long as the used spill array isn't one that moves during GC). The opcode can already be in the bytecode, but it hasn't to be as long as it's known that the access to P31 is a spill.

As with all caching there are potential operations that effect the original data. A GC if a movable object is cached or running "eval"ed code. In this cases the cache is invalidated (possibly partly) by restoring the original opcode that again does the lookup and caches the result.

b) Lexical fetch and store

Thats' probably just the same as above.

3) Polymorphic inline cache (PIC)

I didn't really explain monomorphic above. So first, what is polymorphic? We have e.g. a method call:

  obj."foo"()

which translates to bytecode

  callmethodcc "foo"

with the object in P2. For any bytecode location in the source this would translate to the inline cache version

  callmethodcc <cache>

and the <cache> is as distinct PIC cache structure. The opcode actually is something like:

  obj = P2
  cache = $1                  # the PIC structure
  S0 = cache->method          # "foo"
  if (obj->type == cache->type)
    (cache->func)()

The object at that bytecode location could change (if e.g. an array of objects is processed). So the first operation is to compare the object type with the cached type. If types match the cached function pointer is called. If not a lookup is done.

  else {
    cache->func = find_method(obj, meth)
    cache->type = obj->type
    ...

The fast path should be used in 95% of the cases, states the literature. A second "if" with a second (type, func) pair accounts for another ~4%.

For the mono-morphic case, the underlaying object doesn't change (or there is none), so the type compare isn't needed.

More cachable opcodes:
- get, setattribute
- isa, does, can
- globals lookup
- dynamic subroutines especially multi sub
- array and hash lookups with constants

leo

[¹] modulo multi-threading issues



Reply via email to