I'm getting deeper into JRuby optimization lately, and starting to
ponder strategies for optimizing closures (or, in the case of Clojure
and Scala, optimizing pass-through methods that make megamorphic
callbacks to functions).
First, some explanation...
In Ruby, the "each" method on Array receives a block of code:
[1,2,3].each {|i| do_something_with i}
"each" is implemented as a simple loop over Array elements, each
iteration calling Block.yield on a Block object passed in. Block
aggregates the code body above with a Binding object holding the
surrounding closed-over state.
public IRubyObject eachCommon(ThreadContext context, Block block) {
if (!block.isGiven()) {
throw context.getRuntime().newLocalJumpErrorNoBlock();
}
for (int i = 0; i < realLength; i++) {
// do not coarsen the "safe" catch, since it will
misinterpret AIOOBE from the yielded code.
// See JRUBY-5434
block.yield(context, safeArrayRef(values, begin + i));
}
return this;
}
Similar idioms certainly exist in any other JVM languages that allow
passing functions or closures around.
The problem here is that the "yield" call is always going to go
megamorphic very quickly. There will be dozens of places in a typical
Ruby (or Clojure, or Scala, or Groovy) application that use the same
list-iteration logic, and they all pass through the same generic code.
That means the best you can optimize is inlining the loop logic into
the caller...the closure body won't inline and will have to optimize
on its own.
This also has implications for escape analysis. If there's any state
(including the closure object itself) involved in doing that
iteration, it's now impossible for it to EA away, since the closure
can't inline all the way back into the caller. Any heap-based
structures are now firmly on the heap, and add to our allocation
overhead.
Now, strategies...
In general, what's needed is a way to specialize "each" for many
different call sites and closures.
The easiest for us implementers would be if the JVMs simply started to
see through these calls. Closure/function-receiving code is going to
become more and more common, and indeed was already rather common for
event-handling systems and the like. Rémi and I talked with Fredrik
(JRockit) two JVMLSs ago about how JRockit might be able to optimize
these cases. Fredrik believed it would be possible, but that some sort
of marker was needed on the "each" method to show JRockit that it's
code that calls code. I forget who suggested it, but we decided an
easy marker would be to have the signature receive a MethodHandle or
subtype. JRockit (and perhaps other JVMs) could use that as indication
that this method should be specialized to the caller and provided
closure.
Barring JVM help, we are likely to attempt to optimize this case in
JRuby directly. My strategy will be to see (via runtime profiling)
that particular closure-receiving methods are hot, and do the
specialization myself. This would boil down to having JRuby's JIT emit
both the caller's class body *and* a copy of the "each" body into the
compiled result. The caller, each, and the closure (also in the same
result) would lie along a monomorphic path, allowing all three to
inline together. This will be easy with Ruby code; if I see a
closure-passing call to another Ruby method, I emit a body for that
method too. For core JRuby methods, which are implemented in Java, it
will be trickier; I'll need to either have simple ways to duplicate
those methods in-place or I'll need to move them into Ruby.
Ironically, we may be seeing the beginning of an age where it's faster
to implement JRuby core classes in Ruby.
Have any of you other implementers thought about this? What are you
considering as strategies for optimizing closure invocations?
- Charlie
--
You received this message because you are subscribed to the Google Groups "JVM
Languages" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/jvm-languages?hl=en.