Blogged at:
http://headius.blogspot.com/2007/07/understanding-jvm-jit-and-helping-it.html
Comments here or there are most welcome.
--
I finally understand what we should be going for in our compiled code,
and how we can really kick JRuby into the next level of performance.
The JVM, at least in HotSpot, gets a lot of its performance from its
ability to inline code at runtime, and ultimately compile a method plus
its inlined calls as a whole down to machine code. The benefit in doing
this is the ability to do compiler optimizations across a much larger
call path, essentially compiling all the logic for a method and its
calls (and possibly their calls, ad infinatum) into a single optimized
segment of machine code.
HotSpot is able to do this in a two main ways:
1. If it's obvious there's only ever one implementation of a given
signature on a given type hierarchy
2. If it can determine at runtime that one (or a few) implementations
are the only ones ever being called
The first one allows code to be optimized fairly quickly, because
HotSpot can discover early on that there's only one implementation. In
general, if there's a single implementation of a given signature, it
will get inlined pretty quickly.
The second one is trickier. HotSpot tracks the actual types being called
against for the various calls, and eventually can come up with a best
guess at the method or methods to inline. It also can include a slow
path for the rare future cases where the receiver does not match the
target types, and it can deoptimize later to back down optimizations
when situations change, such as when a new class is loaded into the system.
So in the end, inlining is one of the most powerful optimizations.
Unfortunately in JRuby (and most other dynamic language implementations
on the JVM), we're making inlining difficult or impossible in the most
performance-sensitive areas. I believe this is a large part of our
performance woes.
Consider that all method calls against any object must pass through an
implementation of IRubyObject.callMethod. There's not too many
callMethod implementations, and actually now there's only one
implementation of each specific signature. So callMethod gets inlined
pretty fast.
Consider also that almost all method calls within callMethod are to very
specific methods and will also be inlined quickly. So callMethod is
looking pretty good so far.
Now we look at the last step in callMethod...DynamicMethod.call.
DynamicMethod is the top-level type for all our method objects in the
system. The call method has numerous implementations, all of them
different. And no one implementation stands out as the most frequently
called. So we're already complicating matters for HotSpot, even though
we know (based on the incoming method name) exactly the piece of code we
*want* to call.
Let's continue on, assuming HotSpot is smart enough to work around our
half-dozen or so DynamicMethod.call implementations.
DefaultMethod is the DynamicMethod implementation for interpreted Ruby
code, so it calls directly into the evaluator. So at that point,
DefaultMethod.call will inline the evaluator code and that looks pretty
good. But there's also the JIT located in DefaultMethod. It generates a
JVM bytecode version of the Ruby code and from then on DefaultMethod
calls that. Now that's certainly a good thing on one hand, since we've
eliminate the interpreter, but on the other hand we've essentially made
it impossible for HotSpot to inline that generated code. Why? Because we
generate a Java method for every JITable Ruby method. Hundreds, and
eventually thousands of possible implementations. Making a decision to
inline any of them into DefaultMethod.call is basically impossible.
We've broken the chain.
To make matters worse, we also have the set of Java-wrapping
DynamicMethod implementations, *CallbackMethod (used for binding Java
code to Ruby method names) and CompiledMethod (used in AOT-compiled code).
The CallbackMethods all wrap another piece of generated code that
implements Callback and calls the Java method in question. So we
generate nice little wrappers for all the pre-existing methods we want
to call, but we also make it impossible for the *CallbackMethod.call
implementations to inline any of those calls. Broken chain again.
CompiledMethod is slightly better in this regard, since there's a new
CompiledMethod subclass for every AOT-compiled Ruby method, but we still
have a single implementaiton of DynamicMethod.call that all of those
subclasses share in common. To make matters worse, even if we had
separate DynamicMethod.call implementations, that may actually *hurt*
our ability to inline code way back in IRubyObject.callMethod, since
we've now added N possible DynamicMethod.call implementations to the
system. And the chain gets broken even earlier.
So the bottom line here is that in order to continue improving
performance, we need to do everything possible to move the call site and
the call target closer together. There are a couple standard ways to do it:
1. Hard-coded special-case code for specific situations, much like YARV
does for simple ops (+, -, <, >, etc). In these cases, the compiler
would check that the target implements an appropriate type to do a
direct call to the operation in question. In Fixnum's case, we'd first
confirm it's a RubyFixnum, and then invoke e.g. RubyFixnum.plus
directly. That skips all the chain breakage, and allows the compiled
code to inline RubyFixnum.plus straight into the call site.
2. Dynamic generated method adapters that can be swapped out and that
learn from previous calls to make direct invocations earlier in the
chain. Basically, this would involve preparing call site caches that
point at call adapters. Initially, the call adapters would be of some
generic type that can use the slow path. But as more and more calls come
in, more and more of the call sites would be replaced with specialized
implementations that invoke the appropriate target code directly,
allowing HotSpot a direct line from call site to call target.
The second version is obviously the ultimate goal, and essentially would
mimic what the state-of-the-art JITs do (i.e. this is how HotSpot works
under the covers). The first version is easily testable with some simple
hackery.
I created a small patch that includes a trivial, unsafe change to the
compiler to make Fixnum#+, Fixnum#-, and Fixnum#< direct calls when
possible. They're unsafe because they don't check to see if any of those
operations have been overridden...but of course you'd have to be a mad
fool to override them anyway.
To demonstrate a bit of the potential performance gains, here are some
numbers for JRuby trunk and trunk + patch. Note that Fixnum#+, Fixnum#-,
and Fixnum#< are all already STI methods, which does a lot to speed up
their invocation (STI uses a table of switch values to bypass dynamic
method lookup). But this simple change of compiling direct calls
completely blows the STI performance out of the water, and that's
without similar direct calls to fib_ruby itself.
without patch:
test/bench/bench_fib_recursive.rb
1.675000 0.000000 1.675000 ( 1.675000)
1.244000 0.000000 1.244000 ( 1.244000)
1.183000 0.000000 1.183000 ( 1.183000)
1.173000 0.000000 1.173000 ( 1.173000)
1.171000 0.000000 1.171000 ( 1.170000)
1.178000 0.000000 1.178000 ( 1.178000)
1.170000 0.000000 1.170000 ( 1.170000)
1.169000 0.000000 1.169000 ( 1.169000)
with patch:
test/bench/bench_fib_recursive.rb
1.133000 0.000000 1.133000 ( 1.133000)
0.922000 0.000000 0.922000 ( 0.922000)
0.865000 0.000000 0.865000 ( 0.865000)
0.862000 0.000000 0.862000 ( 0.863000)
0.859000 0.000000 0.859000 ( 0.859000)
0.859000 0.000000 0.859000 ( 0.859000)
0.864000 0.000000 0.864000 ( 0.863000)
0.859000 0.000000 0.859000 ( 0.860000)
That's an improvement of over 25%, with about 20 lines of code. It would
be even higher with a dynamic adapter for the fib_ruby call. And we can
take this further...modify our Java integration code to do direct calls
to Java types, modify compiled code to adapt to methods as they are
redefined or added to the system, and so on and so forth. There's a ton
of potential here.
I will continue working along this path.
- Charlie
Index: src/org/jruby/compiler/impl/StandardASMCompiler.java
===================================================================
--- src/org/jruby/compiler/impl/StandardASMCompiler.java (revision 3964)
+++ src/org/jruby/compiler/impl/StandardASMCompiler.java (working copy)
@@ -405,8 +405,23 @@
int index = MethodIndex.getIndex(name);
+ Label notPrimitive = new Label();
+ Label primitive = new Label();
+
if (hasArgs) {
if (hasReceiver) {
+ if (name.equals("+") || name.equals("-") || name.equals("<")) {
+ // dup self and check if it's primitive
+ mv.dup2();
+ mv.pop();
+ mv.instance_of(cg.p(RubyFixnum.class));
+ mv.ifne(primitive);
+ }
+ }
+ }
+
+ if (hasArgs) {
+ if (hasReceiver) {
// Call with args
// receiver already present
} else {
@@ -493,6 +508,29 @@
mv.label(normalEnd);
}
+ mv.go_to(notPrimitive);
+
+ // determined to be an op against a primitive, do it directly
+ mv.label(primitive);
+ if (hasArgs) {
+ if (hasReceiver) {
+ mv.swap();
+ mv.checkcast(cg.p(RubyFixnum.class));
+ mv.swap();
+ mv.iconst_0();
+ mv.arrayload();
+
+ if (name.equals("+")) {
+ mv.invokevirtual(cg.p(RubyFixnum.class), "plus",
cg.sig(IRubyObject.class, cg.params(IRubyObject.class)));
+ } else if (name.equals("<")) {
+ mv.invokevirtual(cg.p(RubyFixnum.class), "lt",
cg.sig(IRubyObject.class, cg.params(IRubyObject.class)));
+ } else if (name.equals("-")) {
+ mv.invokevirtual(cg.p(RubyFixnum.class), "minus",
cg.sig(IRubyObject.class, cg.params(IRubyObject.class)));
+ }
+ }
+ }
+
+ mv.label(notPrimitive);
}
public void yield(boolean hasArgs, boolean unwrap) {
Index: src/org/jruby/compiler/impl/SkinnyMethodAdapter.java
===================================================================
--- src/org/jruby/compiler/impl/SkinnyMethodAdapter.java (revision 3963)
+++ src/org/jruby/compiler/impl/SkinnyMethodAdapter.java (working copy)
@@ -59,6 +59,10 @@
mv.visitMethodInsn(INVOKEINTERFACE, arg1, arg2, arg3);
}
+ public void instance_of(String arg1) {
+ mv.visitTypeInsn(INSTANCEOF, arg1);
+ }
+
public void areturn() {
mv.visitInsn(ARETURN);
}
---------------------------------------------------------------------
To unsubscribe from this list please visit:
http://xircles.codehaus.org/manage_email