As part of working on making call protocol a little explicit, I started
making some notes about the call paths and figuring out what can be
optimized, what cannot be, what component is responsible, how much can
interpreter benefit, how much can compiler benefit, etc.   Based on this,
I'll work on additional call protocol instructions which could be analyzed
and eliminated where they are unnecessary -- which could benefit the
interpreter as much as the compiler.

-------------------------------------------------
Ruby: o.foo(args) { ... }

CALL(M, o, ..) --> callsite.call(o, ..)
               --> InterpretedIRMethod.call(o, ..)
               --> Interpreter.INTERPRET_METHOD(o, ..)
               --> Interpreter.interpret(o, ..)
               --> ... interpret instrs of M ...
-------------------------------------------------
0. Interpreter encounters "CallInstr(receiver, meth-addr, args, closure)" [
o.call("foo", args, &blk) ]

The call-sequence starts with:
-> prepare-args
-> prepare-block
-> callsite.call(o, ...)
-------------------------------------------------
1. callsite.call

t  = o.metaclass.token (2 loads + thread-poll currently)
c  = cs.token (1 load)
beq c,t,L
cs = search(m)
verify/method-missing/update-cache
L: cs.method.call(args)  <--- InterpretedIRMethod/CompiledIRMethod
-------------------------------------------------
2. InterpretedIRMethod.call(...)

--> has a try-finally wrapper (can be removed if the pushes/pops can be
removed)

push-scope      --> mem-alloc + several instrs
push-frame      --> mem-alloc + several instrs
push-impl-class --> mem-alloc + several instrs

...

pop
pop
pop
-------------------------------------------------
3. Interpreter.INTERPRET_METHOD(...)

--> has a try-finally wrapper (can be removed if the pushes/pops can be
removed)

push-backtrace_info if !synthetic
push-traceable_info if traceable

...

pop
pop
-------------------------------------------------
4. Interpreter.interpret(...)

- get_instrs
- alloc_tmp_vars
- other-init
- profile-if-necessary

... now start interpreting instrs ...

* receive_args
* receive_closure
* check_arity
* thread_poll

... now begin core of the called method ...

So, a lot of song and dance that goes on before we get to executing
the core of the called method.
----------------------------------------------------------------
* Compiler does a whole set of similar things as well.
* The question is how much of this work can we eliminate.

-> JVM: use indy to prepare call site and link through to target.
   [[ Without indy, we'll have to manage call-site caches in IR.
      When there are multiple calls on the same receiver, there
      is benefit in doing it in IR because o.metaclass.token will
      be identical across call sites. ]]

Rest of the work has to be done as part of AST inspection/ IR analysis.

-> AST/IR: eliminate scope-push
-> AST/IR: eliminate frame-push
-> AST/IR: eliminate impl-class-push
-> AST/IR: eliminate backtrace-push
-> AST/IR: eliminate traceable-push

For compiler:
-> get_instrs is a NOP (because instrs = emitted bytecode)
-> alloc_tmp_vars is a NOP (because they are mapped to java locals)
-> other-init might also go away if it is part of emitted bytecode
-> profile-if-necessary will be a NOP if we are not profiling
-> most of receive_args can be NOPs if the args line up correctly in local
vars
-> recv_closure will be NOP for block-less calls
-> check_arity is not a NOP
-> thread_poll is not a NOP

So, with indy and explicit call protocol and opts to get rid of some
of the call handshake, when a compiler JITs, a lot of this intermediate
work vanishes, and the Ruby call overhead can approach that of Java
call overheads.
----------------------------------------------------------------

Reply via email to