Re: Good news, bad news
This message was held because it's too big, so I dropboxed the amd64 dumps here: http://dl.dropbox.com/u/17017035/fib_ruby-asm-64bit.zip On Fri, May 27, 2011 at 9:05 AM, Charles Oliver Nutter head...@headius.com wrote: Hopefully these 64-bit dumps are better now that Tom managed to get a proper 64-bit hsdis built (hooray!). On Thu, May 26, 2011 at 6:36 PM, Charles Oliver Nutter head...@headius.com wrote: On Thu, May 26, 2011 at 6:21 PM, Charles Oliver Nutter head...@headius.com wrote: No worries...I'll repost 32-bit asm in a moment. Attached. This is just fib again. 64-bit is definitely faster than 32-bit for this benchmark, but the older macosx build is also still faster than the current mlvm: old macosx: 9227465 1.235000 0.00 1.235000 ( 1.154000) 9227465 1.012000 0.00 1.012000 ( 1.012000) 9227465 0.969000 0.00 0.969000 ( 0.969000) 9227465 0.962000 0.00 0.962000 ( 0.962000) 9227465 0.953000 0.00 0.953000 ( 0.953000) current mlvm: 9227465 1.505000 0.00 1.505000 ( 1.406000) 9227465 1.175000 0.00 1.175000 ( 1.174000) 9227465 1.77 0.00 1.77 ( 1.771000) 9227465 1.255000 0.00 1.255000 ( 1.255000) 9227465 1.14 0.00 1.14 ( 1.141000) - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 25, 2011, at 9:39 PM, Charles Oliver Nutter wrote: On Wed, May 25, 2011 at 5:12 PM, Tom Rodriguez tom.rodrig...@oracle.com wrote: I've got the bimorphic method handle inline working. I was doing something stupid at the beginning which is why it wasn't as easy as I expected to be and then I encountered a stupid API which wasted a little more time. I was able to leverage the existing invokedynamic call site cache code generation and just generate a different guard. Ignoring the flag soup below, the first is without invokedynamic, second is with and third is with bimorphic inline for selectAlternative. Here's a dumb question...couldn't the bimorphic logic have simply been generated the second way you indicated, rather than the (x ? a : b).whatever form? That pattern is something the java.lang.invoke machinery is generating as far I can tell. selectAlternative does something like: (test ? a : b).invokeExact() and I'm converting it to something like: c = (test ? a : b); if (c == a) a.invokeExact(); else b.invokeExact(); The optimizer turns this into: if (test) a.invokeExact(); else b.invokeExact(); but test is some ugly goo because of boxing. It's relatively easy to get the optimizer to fold away the boxing for boolean but sadly it doesn't help the performance at all. Additionally that ends up touching a fair amount of common code which makes it a little more risky for 7. One oddity I've noticed is that what we're ending up with lots of checkcasts to IRubyObject but the optimizer doesn't eliminate redundant ones, probably because it's a checkcast to an interface. Great to hear this is in! I'll look for an MLVM patch and until then see if I can figure out why the classic GWT logic isn't as fast as it was a couple weeks ago. IF I'm feeling saucy I may try to pull MLVM + bsd-port from two weeks ago (or find a macosx port build from that timeframe) to compare assembly output. Perhaps illustrative for us all! I think that would be informative. tom The changes are at http://javaweb.sfbay.sun.com/~never/webrev/bim. Maybe John can put together an mlvm patch for it. I'm not quite sure how to do so. There are still some minor improvements that can be made. The test part ends up looking like: if ((test ? Boolean.TRUE : Boolean.FALSE).value()) We need to treat more of those values as constants and split the loads up to fold it away. I'm keen to have a look at the assembly resulting from the bimorphic inlining and see if there's any way to improve it further. My test could potentially be improved too. I'm also pondering ways I can go mostly guard-free for certain core JRuby types, possibly reducing the guard to an exact type check. There's definitely runway both on your end and my end. - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 26, 2011, at 12:15 AM, Charles Oliver Nutter wrote: It is definitely frustrating to see that perf has degraded so much the past couple weeks and still not be there with the reverted code. I second that. I did some more cherry-picking this evening and the mlvm patch queue is now closer to the reality of what's in jdk7 b144. I also pushed Tom's bimorphic MH inlining code, on an experimental basis. -- John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 26, 2011, at 10:42 AM, Christian Thalinger wrote: On May 26, 2011, at 9:15 AM, Charles Oliver Nutter wrote: On Thu, May 26, 2011 at 1:58 AM, Tom Rodriguez tom.rodrig...@oracle.com wrote: but test is some ugly goo because of boxing. It's relatively easy to get the optimizer to fold away the boxing for boolean but sadly it doesn't help the performance at all. Additionally that ends up touching a fair amount of common code which makes it a little more risky for 7. It is definitely frustrating to see that perf has degraded so much the past couple weeks and still not be there with the reverted code. I'm hoping your fix will help bring the promised ricochet perf to JRuby, but so far we're a long way off from what perf looked like on earlier builds. There must be something more obvious than boolean box folding...perhaps obvious enough we're not seeing it yet. I also did some performance testing. I used a recent HS repository, applied Tom's patch, used a recent meth.jar provided by John and this is what I got: snip Here are the numbers from April 28 on the same machine: http://mail.openjdk.java.net/pipermail/mlvm-dev/2011-April/002873.html Although I can't remember if the invokedynamic numbers were with or without the guard-removal-hack I tried at that time (I think without, because with the hack the numbers should be very similar to dynopt, see: http://mail.openjdk.java.net/pipermail/mlvm-dev/2011-April/002869.html). Using Tom's new code doesn't make a difference since I think the meth.jar from John includes the reverted GWT code. I looked at the inlining tree of fib and everything looks good and is inlined. There is one invokeExact which is: @ 43 java.lang.invoke.MethodHandle::invokeExact (42 bytes) too big but bumping the MaxInlineSize just shows that it's: @ 43 java.lang.invoke.MethodHandle::invokeExact (42 bytes) call site not reached so it doesn't matter anyway. Is there some additional code that got added to the GWT logic or maybe some additional code on the JRuby side that would explain the slowdown? -- Christian ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On 2011-05-26 14.04, John Rose wrote: On May 26, 2011, at 12:15 AM, Charles Oliver Nutter wrote: It is definitely frustrating to see that perf has degraded so much the past couple weeks and still not be there with the reverted code. I second that. I did some more cherry-picking this evening and the mlvm patch queue is now closer to the reality of what's in jdk7 b144. I also pushed Tom's bimorphic MH inlining code, on an experimental basis. Hey John, Something in the patch queue fails to apply cleanly: + (cd sources/jdk; hg qpush -a) (working directory not at a head) applying meth-ing-7034977.patch applying meth-conv-6939861.patch applying meth-invoke-7032850.patch applying meth-exc-7044892.patch applying meth-review-7032323.patch applying meth-deprecate.patch patching file src/share/classes/java/lang/invoke/MethodHandle.java Hunk #1 FAILED at 505 1 out of 2 hunks FAILED -- saving rejects to file src/share/classes/java/lang/invoke/MethodHandle.java.rej patch failed, unable to continue (try -v) patch failed, rejects left in working dir errors during apply, please fix and refresh meth-deprecate.patch *** Exit status 2. + (cd sources/langtools; hg qpush -a) Cheers -- Ola Bini (http://olabini.com) Ioke - JRuby - ThoughtWorks Yields falsehood when quined yields falsehood when quined. ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 26, 2011, at 1:42 AM, Christian Thalinger wrote: Using Tom's new code doesn't make a difference since I think the meth.jar from John includes the reverted GWT code. I just added a hook to test whether the GWT change matters or not. Choose either flag setting: java -Djava.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=true java -Djava.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=false If false or missing, you get the old reverted logic. This is the refreshed JAR: http://cr.openjdk.java.net/~jrose/pres/indy-javadoc-mlvm/meth.jar The code is also in the mlvm patch repo. A quick check on bench_string_ops.rb shows that a couple of numbers are better but some are worse, especially string == comparison (compare with http://mail.openjdk.java.net/pipermail/mlvm-dev/2011-April/002887.html): This rings a bell: I recently had to fix the primitive unboxing logic to make this sort of thing work correctly (compatibly with reflection): MethodHandle iid = identity(int.class); Object o = (p ? (Byte)(byte)1 : (Integer)2); int x = (int) iid.asType(methodType(int.class, Object.class).invokeExact(o); Note that simply casting to Integer and unboxing won't work. So there's a tricky (and potentially slow) test to do in the adaptation of object arguments to primitive formal parameters, or from object return types to primitive return values. If this is a problem, your assembly code might have calls to something ugly called ValueConversions.primitiveConversion. Hopefully, it just has fast-path code which verifies that a reference is the expected box type, and unboxes. There is always a fast path which does this. But funny wrapper mixing (like the Byte above) will cause the slow path to kick in. -- John___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 26, 2011, at 2:12 AM, Ola Bini wrote: Hey John, Something in the patch queue fails to apply cleanly: That was quick! I noticed that when I spun the meth.jar. Fixed! -- John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
Am 26.05.2011 11:03, schrieb Christian Thalinger: [...] I looked at the inlining tree of fib and everything looks good and is inlined. There is one invokeExact which is: @ 43 java.lang.invoke.MethodHandle::invokeExact (42 bytes) too big but bumping the MaxInlineSize just shows that it's: @ 43 java.lang.invoke.MethodHandle::invokeExact (42 bytes) call site not reached so it doesn't matter anyway. sorry for gutting in here. Is that looking at the inlining tree something normal mortals can also do? I see there some useful information, even without invokedynamic. So is this some debugging code not found in the normal JVM? bye Jochen ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 26, 2011, at 11:24 AM, Jochen Theodorou wrote: Am 26.05.2011 11:03, schrieb Christian Thalinger: [...] I looked at the inlining tree of fib and everything looks good and is inlined. There is one invokeExact which is: @ 43 java.lang.invoke.MethodHandle::invokeExact (42 bytes) too big but bumping the MaxInlineSize just shows that it's: @ 43 java.lang.invoke.MethodHandle::invokeExact (42 bytes) call site not reached so it doesn't matter anyway. sorry for gutting in here. Is that looking at the inlining tree something normal mortals can also do? I see there some useful information, even without invokedynamic. So is this some debugging code not found in the normal JVM? Since a couple of HotSpot builds these switches are also available in product builds: PrintCompilation is a product switch, PrintInlining is a diagnostic switch. Do it like: $ java -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -- Christian ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On Thu, May 26, 2011 at 3:34 AM, John Rose john.r.r...@oracle.com wrote: I did some more cherry-picking this evening and the mlvm patch queue is now closer to the reality of what's in jdk7 b144. I also pushed Tom's bimorphic MH inlining code, on an experimental basis. Ok, I'll rebuild and use current stuff for exploration. - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On Thu, May 26, 2011 at 4:03 AM, Christian Thalinger christian.thalin...@oracle.com wrote: Is there some additional code that got added to the GWT logic or maybe some additional code on the JRuby side that would explain the slowdown? The current logic is 2x faster on the 5/13 build (b141ish maybe?) of macosx port than on MLVM as of Tuesday. I'll update to latest MLVM and see what the situation is. I also can get asm dumps from that 5/13 build, which I'll post (alongside current asm dumps) in the next 24hrs or so. - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On Thu, May 26, 2011 at 4:47 AM, Charles Oliver Nutter head...@headius.com wrote: The current logic is 2x faster on the 5/13 build (b141ish maybe?) of By current logic I mean the logic in JRuby master right now. Same JRuby logic, compared between an earlier macosx build and current mlvm build, and the macosx build is always faster. - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On Thu, May 26, 2011 at 5:25 AM, Rémi Forax fo...@univ-mlv.fr wrote: Charles, why do you use IRubyObject exactly ? why not using Object instead ? I use Object in PHP.reboot and see no problem. Legacy, mostly... JRuby still supports Java 5 and 6, which limits how specialized we can make call paths. Requiring that everything be IRubyObject means we always have only a single invokeinterface to retrieve metaclass, access instance variables, and so on. JRuby designed from scratch today would certainly use Object. - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On 05/26/2011 12:33 PM, Charles Oliver Nutter wrote: On Thu, May 26, 2011 at 5:25 AM, Rémi Foraxfo...@univ-mlv.fr wrote: Charles, why do you use IRubyObject exactly ? why not using Object instead ? I use Object in PHP.reboot and see no problem. Legacy, mostly... JRuby still supports Java 5 and 6, which limits how specialized we can make call paths. Requiring that everything be IRubyObject means we always have only a single invokeinterface to retrieve metaclass, access instance variables, and so on. JRuby designed from scratch today would certainly use Object. For the metaclass, the idea is to not retrieve the metaclass in the fast path, if you have one Java class for one ruby class and a 1 to 1 relation between a class and a metaclass, checking the class and the SwitchPoint of the metaclass is sufficient (because you can bind the switch point). - Charlie Rémi ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On Thu, May 26, 2011 at 6:18 AM, Rémi Forax fo...@univ-mlv.fr wrote: For the metaclass, the idea is to not retrieve the metaclass in the fast path, if you have one Java class for one ruby class and a 1 to 1 relation between a class and a metaclass, checking the class and the SwitchPoint of the metaclass is sufficient (because you can bind the switch point). Right, this would work if all classes defined in Ruby were 1:1 with a class in Java. This is not the case in JRuby; user-defined classes are rarely backed by a new JVM class, and usually just use RubyObject as their JVM representation. I will likely use this strategy for core JRuby classes, however, by making sure that RubyObject, RubyString, etc are not used as the backing store for user-defined classes, and instead having RubyInheritedObject, RubyInheritedString (or something) for that purpose. - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
There's something deeply bizarro about that assembly. It looks like x64 assembly decoded by the 32 assembler. I think your hsdis wasn't built right. tom On May 26, 2011, at 2:13 PM, Charles Oliver Nutter wrote: On Thu, May 26, 2011 at 1:58 AM, Tom Rodriguez tom.rodrig...@oracle.com wrote: Great to hear this is in! I'll look for an MLVM patch and until then see if I can figure out why the classic GWT logic isn't as fast as it was a couple weeks ago. IF I'm feeling saucy I may try to pull MLVM + bsd-port from two weeks ago (or find a macosx port build from that timeframe) to compare assembly output. Perhaps illustrative for us all! I think that would be informative. Ok, I've attached two assembly dumps, one from mostly-current MLVM (build early this morning) and one from the macosx port from 5/13. This is just fib_ruby. I haven't started to explore the differences yet. - Charlie asm_logs.zip___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 26, 2011, at 2:43 PM, Charles Oliver Nutter wrote: Hmm, I'll look into it. I used the amd64 build from the hsdis project I posted the bsd/amd64 libdis when I was desperate for something that would at least link and partially decode instructions. But that hsdis binary doesn't really know about x64 disassembly. (The base project doesn't support x64. The gnu binutils disassembler is needed.) The download should have a warning label; sorry. -- John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
I see someone tagged that build as broken now on Kenai. Can someone with the appropriate skills do a non-broken build? I've never build the hsdis plugin (but I'm attempting to now. - Charlie On Thu, May 26, 2011 at 4:43 PM, Charles Oliver Nutter head...@headius.com wrote: Hmm, I'll look into it. I used the amd64 build from the hsdis project on kenai... On Thu, May 26, 2011 at 4:27 PM, Tom Rodriguez tom.rodrig...@oracle.com wrote: There's something deeply bizarro about that assembly. It looks like x64 assembly decoded by the 32 assembler. I think your hsdis wasn't built right. tom On May 26, 2011, at 2:13 PM, Charles Oliver Nutter wrote: On Thu, May 26, 2011 at 1:58 AM, Tom Rodriguez tom.rodrig...@oracle.com wrote: Great to hear this is in! I'll look for an MLVM patch and until then see if I can figure out why the classic GWT logic isn't as fast as it was a couple weeks ago. IF I'm feeling saucy I may try to pull MLVM + bsd-port from two weeks ago (or find a macosx port build from that timeframe) to compare assembly output. Perhaps illustrative for us all! I think that would be informative. Ok, I've attached two assembly dumps, one from mostly-current MLVM (build early this morning) and one from the macosx port from 5/13. This is just fib_ruby. I haven't started to explore the differences yet. - Charlie asm_logs.zip___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 26, 2011, at 3:45 PM, Charles Oliver Nutter wrote: Ahh...well is there another binary you can point me at? Otherwise, I'll just do i386 asm for now. I tagged it. It's broken in the sense that it's a brutal 64-bit build of code designed only for 32-bit disassembly. I suggest doing at least some of these investigations with 32-bit builds. (But, we do need a binutils-based hsdis!) -- John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
I'll see what I can do on my mac. tom On May 26, 2011, at 3:45 PM, Charles Oliver Nutter wrote: On Thu, May 26, 2011 at 5:16 PM, John Rose john.r.r...@oracle.com wrote: I posted the bsd/amd64 libdis when I was desperate for something that would at least link and partially decode instructions. But that hsdis binary doesn't really know about x64 disassembly. (The base project doesn't support x64. The gnu binutils disassembler is needed.) The download should have a warning label; sorry. Ahh...well is there another binary you can point me at? Otherwise, I'll just do i386 asm for now. - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
Oh, forgot to include dynopt numbers under that older macosx build: ~/projects/jruby ➔ jruby -Xcompile.dynopt=true -Xinvokedynamic.invocation=false --server -J-d32 bench/bench_fib_recursive.rb 5 35 9227465 1.366000 0.00 1.366000 ( 1.293000) 9227465 0.949000 0.00 0.949000 ( 0.948000) 9227465 0.965000 0.00 0.965000 ( 0.965000) 9227465 0.932000 0.00 0.932000 ( 0.932000) 9227465 0.924000 0.00 0.924000 ( 0.924000) As mentioned before, the old build was just a hair slower than dynopt. - Charlie On Thu, May 26, 2011 at 6:36 PM, Charles Oliver Nutter head...@headius.com wrote: On Thu, May 26, 2011 at 6:21 PM, Charles Oliver Nutter head...@headius.com wrote: No worries...I'll repost 32-bit asm in a moment. Attached. This is just fib again. 64-bit is definitely faster than 32-bit for this benchmark, but the older macosx build is also still faster than the current mlvm: old macosx: 9227465 1.235000 0.00 1.235000 ( 1.154000) 9227465 1.012000 0.00 1.012000 ( 1.012000) 9227465 0.969000 0.00 0.969000 ( 0.969000) 9227465 0.962000 0.00 0.962000 ( 0.962000) 9227465 0.953000 0.00 0.953000 ( 0.953000) current mlvm: 9227465 1.505000 0.00 1.505000 ( 1.406000) 9227465 1.175000 0.00 1.175000 ( 1.174000) 9227465 1.77 0.00 1.77 ( 1.771000) 9227465 1.255000 0.00 1.255000 ( 1.255000) 9227465 1.14 0.00 1.14 ( 1.141000) - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 25, 2011, at 5:58 AM, Ola Bini wrote: Hi, There are at least three problems that are still there. They might be connected, or not. (I will tell you how to reproduce these at the end) I just built a new JVM: openjdk version 1.7.0-internal OpenJDK Runtime Environment (build 1.7.0-internal-olabini_2011_05_25_08_06-b00) OpenJDK Server VM (build 21.0-b09, mixed mode) I get that weird missing class error when running my test suite: java.lang.NoClassDefFoundError: seph/lang/SephObject java.lang.RuntimeException: java.lang.NoClassDefFoundError: seph/lang/SephObject at seph.lang.Runtime.evaluateStream(Runtime.java:132) at seph.lang.Runtime.evaluateString(Runtime.java:146) at seph.lang.code.BasicSanityTest.recursive_odd_and_even_that_should_blow_the_stack(BasicSanityTest.java:214) Caused by: java.lang.NoClassDefFoundError: seph/lang/SephObject at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at seph$gen$abstraction$41$even?.argument_0_0(eval:7) at seph.lang.compiler.Bootstrap.intrinsic_if(Bootstrap.java:654) at seph$gen$abstraction$41$even?.even?(eval:7) at seph$gen$abstraction$39$toplevel.toplevel(eval:25) at seph.lang.Runtime.evaluateStream(Runtime.java:125) This is obviously a complete blocker. Turning off the ricochet frames hits an NYI error when running the test suite. I'm still seeing a crash in ricochet frames: # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0xa) at pc=0x0104884f, pid=85043, tid=2953850880 # # JRE version: 7.0 # Java VM: OpenJDK Client VM (21.0-b09 mixed mode bsd-x86 ) # Problematic frame: # V [libjvm.dylib+0x4884f] MethodHandles::ricochet_frame_oops_do(frame const, OopClosure*, RegisterMap const*)+0x12f # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try ulimit -c unlimited before starting Java again # # An error report file with more information is saved as: # /Users/olabini/workspace/seph/hs_err_pid85043.log # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # I've attached the error file. The third error (in frame.cpp) only happens on master of seph, but there are some other problems with master that means you get lots of weird output. In order to reproduce problem 1 and 2: git clone git://github.com/seph-lang/seph.git git checkout 9075c0f4ffe1adac0657057aee2193f16ad12a43 (build.xml: remove lines with jvmarg value=-Xint/) problem 1: ant This should show you one entry of the missing class problem I tried this on solaris-i586 and it works: BUILD SUCCESSFUL Total time: 29 seconds problem 2: ant jar-notest bin/seph bench/bench_arithmetic.sp This should show you problem 2. If you for reference want to see the benchmark run correctly, do bin/seph -J-Xint bench/bench_arithmetic.sp Works too (with server compiler on b143). problem 3: git checkout master git checkout 99c8f2609d468835390e39b68c73f21cc78e5ab5 ant clean jar-notest bin/seph bench/bench_arithmetic.sp This should show you problem 3. You will also see loads of other exceptions, since that point generates slightly bad bytecode. That shouldn't make the JVM crash though, I assume - and I've seen the frame.cpp should not reach here without seeing those exceptions. I think I see the exception you are talking about but there is so much output that I'm not sure. I curious about where the problem lies, either it's a HotSpot fix that's not in the mlvm repository yet or it's a JDK bug in one of the mlvm patches which hasn't been pushed yet into the main repository. At least that's my best guess. May I suggest that you guys also try with an official build (like JDK 7 b143) from here (assuming everyone has some kind of access to a Linux box too): http://jdk7.java.net/download.html -- Christian All of these require that JAVA_HOME points to the Java 7 you want to use Cheers tom On May 23, 2011, at 7:33 PM, Ola Bini wrote: Hi, I'm happy to see that the performance degradation is getting addressed. I would like to point out that there is still a serious crash in the machinery too... Have you seen any reason why that happens? Cheers On 2011-05-24 06.11, John Rose wrote: On May 23, 2011, at 3:44 PM, Tom Rodriguez wrote: I'd *love* for intermediate static Java snippits like this to inline straight through, even if invokeExact would be megamorphic under normal circumstances...but I don't think that's the case right now, right? I haven't been following 292 that closely but it used to be a piece of code that did precisely that. +private Object invoke_L0() throws Throwable { + if ((boolean) test.invokeExact()) +return target.invokeExact(); +return fallback.invokeExact(); + } It
Re: Good news, bad news
Hi, Just to clarify, my builds are against the current patchset in the MLVM repository, so that might explain why you're not seeing these problems. Cheers On 2011-05-25 14.29, Christian Thalinger wrote: On May 25, 2011, at 5:58 AM, Ola Bini wrote: Hi, There are at least three problems that are still there. They might be connected, or not. (I will tell you how to reproduce these at the end) I just built a new JVM: openjdk version 1.7.0-internal OpenJDK Runtime Environment (build 1.7.0-internal-olabini_2011_05_25_08_06-b00) OpenJDK Server VM (build 21.0-b09, mixed mode) I get that weird missing class error when running my test suite: java.lang.NoClassDefFoundError: seph/lang/SephObject java.lang.RuntimeException: java.lang.NoClassDefFoundError: seph/lang/SephObject at seph.lang.Runtime.evaluateStream(Runtime.java:132) at seph.lang.Runtime.evaluateString(Runtime.java:146) at seph.lang.code.BasicSanityTest.recursive_odd_and_even_that_should_blow_the_stack(BasicSanityTest.java:214) Caused by: java.lang.NoClassDefFoundError: seph/lang/SephObject at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at seph$gen$abstraction$41$even?.argument_0_0(eval:7) at seph.lang.compiler.Bootstrap.intrinsic_if(Bootstrap.java:654) at seph$gen$abstraction$41$even?.even?(eval:7) at seph$gen$abstraction$39$toplevel.toplevel(eval:25) at seph.lang.Runtime.evaluateStream(Runtime.java:125) This is obviously a complete blocker. Turning off the ricochet frames hits an NYI error when running the test suite. I'm still seeing a crash in ricochet frames: # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0xa) at pc=0x0104884f, pid=85043, tid=2953850880 # # JRE version: 7.0 # Java VM: OpenJDK Client VM (21.0-b09 mixed mode bsd-x86 ) # Problematic frame: # V [libjvm.dylib+0x4884f] MethodHandles::ricochet_frame_oops_do(frame const, OopClosure*, RegisterMap const*)+0x12f # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try ulimit -c unlimited before starting Java again # # An error report file with more information is saved as: # /Users/olabini/workspace/seph/hs_err_pid85043.log # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # I've attached the error file. The third error (in frame.cpp) only happens on master of seph, but there are some other problems with master that means you get lots of weird output. In order to reproduce problem 1 and 2: git clone git://github.com/seph-lang/seph.git git checkout 9075c0f4ffe1adac0657057aee2193f16ad12a43 (build.xml: remove lines with jvmarg value=-Xint/) problem 1: ant This should show you one entry of the missing class problem I tried this on solaris-i586 and it works: BUILD SUCCESSFUL Total time: 29 seconds problem 2: ant jar-notest bin/seph bench/bench_arithmetic.sp This should show you problem 2. If you for reference want to see the benchmark run correctly, do bin/seph -J-Xint bench/bench_arithmetic.sp Works too (with server compiler on b143). problem 3: git checkout master git checkout 99c8f2609d468835390e39b68c73f21cc78e5ab5 ant clean jar-notest bin/seph bench/bench_arithmetic.sp This should show you problem 3. You will also see loads of other exceptions, since that point generates slightly bad bytecode. That shouldn't make the JVM crash though, I assume - and I've seen the frame.cpp should not reach here without seeing those exceptions. I think I see the exception you are talking about but there is so much output that I'm not sure. I curious about where the problem lies, either it's a HotSpot fix that's not in the mlvm repository yet or it's a JDK bug in one of the mlvm patches which hasn't been pushed yet into the main repository. At least that's my best guess. May I suggest that you guys also try with an official build (like JDK 7 b143) from here (assuming everyone has some kind of access to a Linux box too): http://jdk7.java.net/download.html -- Christian All of these require that JAVA_HOME points to the Java 7 you want to use Cheers tom On May 23, 2011, at 7:33 PM, Ola Bini wrote: Hi, I'm happy to see that the performance degradation is getting addressed. I would like to point out that there is still a serious crash in the machinery too... Have you seen any reason why that happens? Cheers On 2011-05-24 06.11, John Rose wrote: On May 23, 2011, at 3:44 PM, Tom Rodriguez wrote: I'd *love* for intermediate static Java snippits like this to inline straight through, even if invokeExact would be megamorphic under normal circumstances...but I don't think that's the case right now, right? I haven't been following 292 that closely but it used to be a piece of code that did precisely that.
Re: Good news, bad news
On May 25, 2011, at 12:20 PM, Ola Bini wrote: Hi, Just to clarify, my builds are against the current patchset in the MLVM repository, so that might explain why you're not seeing these problems. I know that. That's why we need to find out where the problem is (some hints below). Can someone provide a jar file of the current MLVM JSR 292 classes (like John's meth.jar)? -- Christian Cheers On 2011-05-25 14.29, Christian Thalinger wrote: On May 25, 2011, at 5:58 AM, Ola Bini wrote: Hi, There are at least three problems that are still there. They might be connected, or not. (I will tell you how to reproduce these at the end) I just built a new JVM: openjdk version 1.7.0-internal OpenJDK Runtime Environment (build 1.7.0-internal-olabini_2011_05_25_08_06-b00) OpenJDK Server VM (build 21.0-b09, mixed mode) I get that weird missing class error when running my test suite: java.lang.NoClassDefFoundError: seph/lang/SephObject java.lang.RuntimeException: java.lang.NoClassDefFoundError: seph/lang/SephObject at seph.lang.Runtime.evaluateStream(Runtime.java:132) at seph.lang.Runtime.evaluateString(Runtime.java:146) at seph.lang.code.BasicSanityTest.recursive_odd_and_even_that_should_blow_the_stack(BasicSanityTest.java:214) Caused by: java.lang.NoClassDefFoundError: seph/lang/SephObject at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at seph$gen$abstraction$41$even?.argument_0_0(eval:7) at seph.lang.compiler.Bootstrap.intrinsic_if(Bootstrap.java:654) at seph$gen$abstraction$41$even?.even?(eval:7) at seph$gen$abstraction$39$toplevel.toplevel(eval:25) at seph.lang.Runtime.evaluateStream(Runtime.java:125) This is obviously a complete blocker. Turning off the ricochet frames hits an NYI error when running the test suite. I'm still seeing a crash in ricochet frames: # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0xa) at pc=0x0104884f, pid=85043, tid=2953850880 # # JRE version: 7.0 # Java VM: OpenJDK Client VM (21.0-b09 mixed mode bsd-x86 ) # Problematic frame: # V [libjvm.dylib+0x4884f] MethodHandles::ricochet_frame_oops_do(frame const, OopClosure*, RegisterMap const*)+0x12f # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try ulimit -c unlimited before starting Java again # # An error report file with more information is saved as: # /Users/olabini/workspace/seph/hs_err_pid85043.log # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # I've attached the error file. The third error (in frame.cpp) only happens on master of seph, but there are some other problems with master that means you get lots of weird output. In order to reproduce problem 1 and 2: git clone git://github.com/seph-lang/seph.git git checkout 9075c0f4ffe1adac0657057aee2193f16ad12a43 (build.xml: remove lines with jvmarg value=-Xint/) problem 1: ant This should show you one entry of the missing class problem I tried this on solaris-i586 and it works: BUILD SUCCESSFUL Total time: 29 seconds problem 2: ant jar-notest bin/seph bench/bench_arithmetic.sp This should show you problem 2. If you for reference want to see the benchmark run correctly, do bin/seph -J-Xint bench/bench_arithmetic.sp Works too (with server compiler on b143). problem 3: git checkout master git checkout 99c8f2609d468835390e39b68c73f21cc78e5ab5 ant clean jar-notest bin/seph bench/bench_arithmetic.sp This should show you problem 3. You will also see loads of other exceptions, since that point generates slightly bad bytecode. That shouldn't make the JVM crash though, I assume - and I've seen the frame.cpp should not reach here without seeing those exceptions. I think I see the exception you are talking about but there is so much output that I'm not sure. I curious about where the problem lies, either it's a HotSpot fix that's not in the mlvm repository yet or it's a JDK bug in one of the mlvm patches which hasn't been pushed yet into the main repository. At least that's my best guess. May I suggest that you guys also try with an official build (like JDK 7 b143) from here (assuming everyone has some kind of access to a Linux box too): http://jdk7.java.net/download.html -- Christian All of these require that JAVA_HOME points to the Java 7 you want to use Cheers tom On May 23, 2011, at 7:33 PM, Ola Bini wrote: Hi, I'm happy to see that the performance degradation is getting addressed. I would like to point out that there is still a serious crash in the machinery too... Have you seen any reason why that happens? Cheers On 2011-05-24 06.11, John Rose wrote: On May 23, 2011, at 3:44 PM, Tom Rodriguez wrote: I'd *love* for intermediate static Java snippits like
Re: Good news, bad news
Well, on my linux box with a JDK built this morning I don't see any of these problems actually. (However, there are things on my master that only works with the mlvm patches. Type conversions specifically.) Problem 3 also seems to not be there - you would notice if it were since it's also a total JVM crash. I do not see it on Linux with this version: openjdk version 1.7.0-internal OpenJDK Runtime Environment (build 1.7.0-internal-obini2_2011_05_25_03_33-b00) OpenJDK 64-Bit Server VM (build 21.0-b13, mixed mode) I built it from http://hg.openjdk.java.net/jdk7/jdk7 so it's not an official drop, but it's not bsdport with mlvm either. Cheers On 2011-05-25 14.29, Christian Thalinger wrote: On May 25, 2011, at 5:58 AM, Ola Bini wrote: Hi, There are at least three problems that are still there. They might be connected, or not. (I will tell you how to reproduce these at the end) I just built a new JVM: openjdk version 1.7.0-internal OpenJDK Runtime Environment (build 1.7.0-internal-olabini_2011_05_25_08_06-b00) OpenJDK Server VM (build 21.0-b09, mixed mode) I get that weird missing class error when running my test suite: java.lang.NoClassDefFoundError: seph/lang/SephObject java.lang.RuntimeException: java.lang.NoClassDefFoundError: seph/lang/SephObject at seph.lang.Runtime.evaluateStream(Runtime.java:132) at seph.lang.Runtime.evaluateString(Runtime.java:146) at seph.lang.code.BasicSanityTest.recursive_odd_and_even_that_should_blow_the_stack(BasicSanityTest.java:214) Caused by: java.lang.NoClassDefFoundError: seph/lang/SephObject at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at seph$gen$abstraction$41$even?.argument_0_0(eval:7) at seph.lang.compiler.Bootstrap.intrinsic_if(Bootstrap.java:654) at seph$gen$abstraction$41$even?.even?(eval:7) at seph$gen$abstraction$39$toplevel.toplevel(eval:25) at seph.lang.Runtime.evaluateStream(Runtime.java:125) This is obviously a complete blocker. Turning off the ricochet frames hits an NYI error when running the test suite. I'm still seeing a crash in ricochet frames: # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0xa) at pc=0x0104884f, pid=85043, tid=2953850880 # # JRE version: 7.0 # Java VM: OpenJDK Client VM (21.0-b09 mixed mode bsd-x86 ) # Problematic frame: # V [libjvm.dylib+0x4884f] MethodHandles::ricochet_frame_oops_do(frame const, OopClosure*, RegisterMap const*)+0x12f # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try ulimit -c unlimited before starting Java again # # An error report file with more information is saved as: # /Users/olabini/workspace/seph/hs_err_pid85043.log # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # I've attached the error file. The third error (in frame.cpp) only happens on master of seph, but there are some other problems with master that means you get lots of weird output. In order to reproduce problem 1 and 2: git clone git://github.com/seph-lang/seph.git git checkout 9075c0f4ffe1adac0657057aee2193f16ad12a43 (build.xml: remove lines with jvmarg value=-Xint/) problem 1: ant This should show you one entry of the missing class problem I tried this on solaris-i586 and it works: BUILD SUCCESSFUL Total time: 29 seconds problem 2: ant jar-notest bin/seph bench/bench_arithmetic.sp This should show you problem 2. If you for reference want to see the benchmark run correctly, do bin/seph -J-Xint bench/bench_arithmetic.sp Works too (with server compiler on b143). problem 3: git checkout master git checkout 99c8f2609d468835390e39b68c73f21cc78e5ab5 ant clean jar-notest bin/seph bench/bench_arithmetic.sp This should show you problem 3. You will also see loads of other exceptions, since that point generates slightly bad bytecode. That shouldn't make the JVM crash though, I assume - and I've seen the frame.cpp should not reach here without seeing those exceptions. I think I see the exception you are talking about but there is so much output that I'm not sure. I curious about where the problem lies, either it's a HotSpot fix that's not in the mlvm repository yet or it's a JDK bug in one of the mlvm patches which hasn't been pushed yet into the main repository. At least that's my best guess. May I suggest that you guys also try with an official build (like JDK 7 b143) from here (assuming everyone has some kind of access to a Linux box too): http://jdk7.java.net/download.html -- Christian All of these require that JAVA_HOME points to the Java 7 you want to use Cheers tom On May 23, 2011, at 7:33 PM, Ola Bini wrote: Hi, I'm happy to see that the performance degradation is getting addressed. I would like to point out that there is still a serious
Re: Good news, bad news
On May 25, 2011, at 3:23 AM, Christian Thalinger wrote: I know that. That's why we need to find out where the problem is (some hints below). Can someone provide a jar file of the current MLVM JSR 292 classes (like John's meth.jar)? I just posted it with the javadoc: http://cr.openjdk.java.net/~jrose/pres/indy-javadoc-mlvm/ http://cr.openjdk.java.net/~jrose/pres/indy-javadoc-mlvm/meth.jar http://cr.openjdk.java.net/~jrose/pres/indy-javadoc-mlvm/test-classes.jar Also, here's my setup for running the test classes: $ hg clone http://hg.openjdk.java.net/mlvm/mlvm /tmp/mlvm $ hg clone ssh://hg.kenai.com/ninja~indify-repo /tmp/indify $ ant test -Dproject.indify=/tmp/indify -Dplatforms.JDK_1.7.home=$JAVA7X_BUILD -Dsrc.special-config-test-src.dir='' (It's a twisty path. Your mileage will vary.) -- John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
Actually now that I look at it, this isn't all that solid performance. It's just *slightly* faster than using JRuby's inline cache, which almost never inlines... Inline cache only, via CachingCallSite. ~/projects/jruby ➔ jruby -Xcompile.invokedynamic=false --server bench/bench_fib_recursive.rb 5 359227465 1.23 0.00 1.23 ( 1.114000) 9227465 1.024000 0.00 1.024000 ( 1.024000) 9227465 1.016000 0.00 1.016000 ( 1.016000) 9227465 0.993000 0.00 0.993000 ( 0.993000) 9227465 1.003000 0.00 1.003000 ( 1.003000) So invokedynamic dispatch is better, but only by a tiny margin. :( I'll start browsing logs and asm dumps later today to see if I can figure out why it's not as fast as before. - Charlie On Wed, May 25, 2011 at 12:00 PM, Charles Oliver Nutter head...@headius.com wrote: On Wed, May 25, 2011 at 11:31 AM, Charles Oliver Nutter head...@headius.com wrote: Build in progress! I'll let you know how it goes. How low is low in low arity? Ok, we're definitely back to having solid performance, but it's not as good as it was before. I'm not sure if it's related to the inlining thresholds, since bumping them up the same way we used to doesn't get all the perf back. Here's the numbers I have. It's now around 3x faster than it was a few days ago, but not as fast as JRuby's dynopt mode (which inserts a guarded invokevirtual next to a normal IC hit). The invokedynamic logic was nearly as fast as dynopt before, which was breathtaking to behold. Indy, indy with thresholds bumped, dynopt: ~/projects/jruby ➔ jruby --server bench/bench_fib_recursive.rb 5 359227465 1.215000 0.00 1.215000 ( 1.155000) 9227465 0.985000 0.00 0.985000 ( 0.985000) 9227465 0.977000 0.00 0.977000 ( 0.977000) 9227465 0.981000 0.00 0.981000 ( 0.981000) 9227465 0.976000 0.00 0.976000 ( 0.976000) ~/projects/jruby ➔ jruby -J-XX:MaxInlineSize=150 -J-XX:InlineSmallCode=3000 --server bench/bench_fib_recursive.rb 5 35 9227465 1.209000 0.00 1.209000 ( 1.132000) 9227465 0.981000 0.00 0.981000 ( 0.981000) 9227465 0.974000 0.00 0.974000 ( 0.974000) 9227465 0.971000 0.00 0.971000 ( 0.971000) 9227465 0.969000 0.00 0.969000 ( 0.969000) ~/projects/jruby ➔ jruby -Xcompile.dynopt=true -Xcompile.invokedynamic=false --server bench/bench_fib_recursive.rb 5 35 9227465 1.078000 0.00 1.078000 ( 1.013000) 9227465 0.706000 0.00 0.706000 ( 0.706000) 9227465 0.697000 0.00 0.697000 ( 0.697000) 9227465 0.699000 0.00 0.699000 ( 0.699000) 9227465 0.695000 0.00 0.695000 ( 0.695000) ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 25, 2011, at 9:31 AM, Charles Oliver Nutter wrote: Build in progress! I'll let you know how it goes. How low is low in low arity? About 0-9. Let's find the next bottleneck! -- John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On Wed, May 25, 2011 at 2:14 PM, John Rose john.r.r...@oracle.com wrote: On May 25, 2011, at 9:31 AM, Charles Oliver Nutter wrote: Build in progress! I'll let you know how it goes. How low is low in low arity? About 0-9. Let's find the next bottleneck! That's the spirit! We can do it! - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
I've got the bimorphic method handle inline working. I was doing something stupid at the beginning which is why it wasn't as easy as I expected to be and then I encountered a stupid API which wasted a little more time. I was able to leverage the existing invokedynamic call site cache code generation and just generate a different guard. Ignoring the flag soup below, the first is without invokedynamic, second is with and third is with bimorphic inline for selectAlternative. ( setenv _JAVA_OPTIONS -XX:-PrintCompilation -Xbootclasspath/p:/export/ws/twisti/meth.jar -XX:+UseNewCode -Xbatch -XX:-UseNewCode2 -XX:-LogCompilation -XX:-PrintInlining ; setenv JAVA_HOME /export/ws/indy/i586/jdk1.7.0 ; ./bin/jruby --server -Xcompile.invokedynamic=false bench/bench_fib_recursive.rb 5 35 ) 2.006000 0.00 2.006000 ( 1.856000) 1.391000 0.00 1.391000 ( 1.391000) 1.386000 0.00 1.386000 ( 1.386000) 1.386000 0.00 1.386000 ( 1.386000) 1.385000 0.00 1.385000 ( 1.385000) ( setenv _JAVA_OPTIONS -XX:-PrintCompilation -Xbootclasspath/p:/export/ws/twisti/meth.jar -XX:+UseNewCode -Xbatch -XX:-UseNewCode2 -XX:-LogCompilation -XX:-PrintInlining ; setenv JAVA_HOME /export/ws/indy/i586/jdk1.7.0 ; ./bin/jruby --server -Xcompile.invokedynamic=true bench/bench_fib_recursive.rb 5 35 ) 4.235000 0.00 4.235000 ( 4.086000) 3.689000 0.00 3.689000 ( 3.689000) 3.687000 0.00 3.687000 ( 3.688000) 3.68 0.00 3.68 ( 3.68) 3.68 0.00 3.68 ( 3.68) ( setenv _JAVA_OPTIONS -XX:-PrintCompilation -Xbootclasspath/p:/export/ws/twisti/meth.jar -XX:+UseNewCode -Xbatch -XX:+UseNewCode2 -XX:-LogCompilation -XX:-PrintInlining ; setenv JAVA_HOME /export/ws/indy/i586/jdk1.7.0 ; ./bin/jruby --server -Xcompile.invokedynamic=true bench/bench_fib_recursive.rb 5 35 )Picked up 2.103000 0.00 2.103000 ( 1.95) 1.281000 0.00 1.281000 ( 1.281000) 1.275000 0.00 1.275000 ( 1.274000) 1.279000 0.00 1.279000 ( 1.279000) 1.288000 0.00 1.288000 ( 1.288000) The changes are at http://javaweb.sfbay.sun.com/~never/webrev/bim. Maybe John can put together an mlvm patch for it. I'm not quite sure how to do so. There are still some minor improvements that can be made. The test part ends up looking like: if ((test ? Boolean.TRUE : Boolean.FALSE).value()) We need to treat more of those values as constants and split the loads up to fold it away. tom On May 25, 2011, at 12:14 PM, John Rose wrote: On May 25, 2011, at 9:31 AM, Charles Oliver Nutter wrote: Build in progress! I'll let you know how it goes. How low is low in low arity? About 0-9. Let's find the next bottleneck! -- John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On Wed, May 25, 2011 at 5:12 PM, Tom Rodriguez tom.rodrig...@oracle.com wrote: I've got the bimorphic method handle inline working. I was doing something stupid at the beginning which is why it wasn't as easy as I expected to be and then I encountered a stupid API which wasted a little more time. I was able to leverage the existing invokedynamic call site cache code generation and just generate a different guard. Ignoring the flag soup below, the first is without invokedynamic, second is with and third is with bimorphic inline for selectAlternative. Here's a dumb question...couldn't the bimorphic logic have simply been generated the second way you indicated, rather than the (x ? a : b).whatever form? Great to hear this is in! I'll look for an MLVM patch and until then see if I can figure out why the classic GWT logic isn't as fast as it was a couple weeks ago. IF I'm feeling saucy I may try to pull MLVM + bsd-port from two weeks ago (or find a macosx port build from that timeframe) to compare assembly output. Perhaps illustrative for us all! The changes are at http://javaweb.sfbay.sun.com/~never/webrev/bim. Maybe John can put together an mlvm patch for it. I'm not quite sure how to do so. There are still some minor improvements that can be made. The test part ends up looking like: if ((test ? Boolean.TRUE : Boolean.FALSE).value()) We need to treat more of those values as constants and split the loads up to fold it away. I'm keen to have a look at the assembly resulting from the bimorphic inlining and see if there's any way to improve it further. My test could potentially be improved too. I'm also pondering ways I can go mostly guard-free for certain core JRuby types, possibly reducing the guard to an exact type check. There's definitely runway both on your end and my end. - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
I haven't been able to reproduce a crash that looks like either of the issues you reported, the ShouldNotReachHere in frame.cpp and the crash in FastScanClosure. Do you have a test that can reproduce it and are you running against the latest version of the code? tom On May 23, 2011, at 7:33 PM, Ola Bini wrote: Hi, I'm happy to see that the performance degradation is getting addressed. I would like to point out that there is still a serious crash in the machinery too... Have you seen any reason why that happens? Cheers On 2011-05-24 06.11, John Rose wrote: On May 23, 2011, at 3:44 PM, Tom Rodriguez wrote: I'd *love* for intermediate static Java snippits like this to inline straight through, even if invokeExact would be megamorphic under normal circumstances...but I don't think that's the case right now, right? I haven't been following 292 that closely but it used to be a piece of code that did precisely that. +private Object invoke_L0() throws Throwable { + if ((boolean) test.invokeExact()) +return target.invokeExact(); +return fallback.invokeExact(); + } It looks like it was reworked at some point to use selectAlternative but the optimizer was never updated to deal with it properly. It's not particularly hard, we just need to generate code like we would for a bimorphic call site. The current code expects to either get a constant or something else and doesn't inline if it's something else. In this case we have a Phi of two constants which we just need to split. Yes, it would be a quasi-bimorphic call site keyed from a phi of two constants, instead of a profile of two receiver classes. I'm still unclear why you couldn't write your own variant of guardWithTest and have it work but my knowledge of what's really allowed is somewhat limited. Those snippets will inline (I think), but at some point Charlie will want to fetch a bit of constant stuff out of an instance variable. That will look non-constant to the optimizer, even if the instance variable is final and the enclosing object is a constant reference. We made sure this happens for java.lang.invoke classes, but we haven't extended it yet to user code, in part because it would have its own bug tail to work through. -- John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev -- Ola Bini (http://olabini.com) Ioke - JRuby - ThoughtWorks Yields falsehood when quined yields falsehood when quined. ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 23, 2011, at 5:43 PM, Charles Oliver Nutter wrote: Well I have a prototype GWT in place, but I'm having trouble getting invokeExact to work. As far as I can tell, incoming arguments should match the handle type just fine, but I'm getting this error: Hmm... I don't see the bug in your diff. Looks like someone is trying to invoke a MH in varargs style (Object...). The error says InvokeDynamicSupport.java:934:in `gwt': but I don't know where this might be. Anyway, I put GWT back the way it was, for (a) low arity and (b) method types without primitives. (For high arity and/or types with primitives, it still uses ricochet frames, for which Tom found an optimization opportunity.) See if it makes things better for you... -- John___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On 2011-05-25 10.50, John Rose wrote: On May 23, 2011, at 7:33 PM, Ola Bini wrote: Have you seen any reason why that happens? There are some known problems (crashers even) with method handle compilation. The permuteArguments transform has a bug when you mix long/doubles and other argument types. I have a crasher in one of my dispatch pattern examples that is similar to some of the code you have been talking about. Yes, there is at least one gremlin left in there. Thanks for posting the detailed instructions for reproducing your errors! Thanks! I don't use permuteArguments anywhere in my code, actually, but maybe one of the other combinators is defined in terms of it? Cheers -- Ola Bini (http://olabini.com) Ioke - JRuby - ThoughtWorks Yields falsehood when quined yields falsehood when quined. ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 24, 2011, at 10:25 PM, Ola Bini wrote: I don't use permuteArguments anywhere in my code, actually, but maybe one of the other combinators is defined in terms of it? No, it's stand-alone. Factoid: All the other combinators are order-stable with respect to arguments. I toyed with using it to normalize complex calling sequences (prim/ref mixes) but never got anywhere. -- John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 21, 2011, at 1:31 AM, Charles Oliver Nutter wrote: I am still using the specialized math callsites because they inline well and an Indy-based approach was still quite a bit slower last time I tried. I am back to being worried about the inlining thresholds. With no MHs and runtime profiled direct calls, I was able to get this all to inline. So basically the fact that I am using Indy and MHs now penalizes me over not just profiled direct calls, but over a totally generic inline cache that usually doesn't inline at all. That's pretty bad :) I was on vacation for a couple of days but now I'm back. I looked at the inlining tree for fib_recursive and I can't see the fib_ruby method getting compiled. But instead I see something I've never seen before: 2949 105 org.jruby.ast.CallOneArgNode::interpret (34 bytes) or 2997 108 ! org.jruby.evaluator.ASTInterpreter::INTERPRET_METHOD (126 bytes) Does that mean JRuby doesn't compile these methods but interpret them? -- Christian I can try fiddling with the thresholds later, but do we agree that an invokedynamic bound to a GWT plus some intermediate adapter handles ending in a direct handle should inline as well as e.g. an invokevirtual? - Charlie (mobile) On May 20, 2011, at 17:35, Rémi Forax fo...@univ-mlv.fr wrote: On 05/20/2011 11:08 PM, Tom Rodriguez wrote: Is it the inlining policy? I played a little with this and the current default policy is still not as good as the big tweaks to MaxInlineSize and InlineSmallCode. tom I tend to agree, things like isFixnumReopened and getCacheToken seem to be not inlined. Charles, do you have tested if the inline flags are tweaked ? Also I do not understand why you have JRuby's callsites like MinusCallSite in the stacktrace. You should not use it if invokedynamic is enabled. Rémi On May 20, 2011, at 1:21 PM, Charles Oliver Nutter wrote: Meh, false alarm. This is just the default empty Object[] assigned to all objects. I still can't explain the perf degradation. Here's the full assembly for fib_ruby: https://gist.github.com/983721 It seems like stuff is inlining, so I have no idea why performance is so terrible. - Charlie On Fri, May 20, 2011 at 2:59 PM, Charles Oliver Nutter head...@headius.com wrote: Another update...I managed to turn off the ricochet frames and have it still work (???). The with-ricochet mode is faster, but still 2-3x slower than no indy at all (using JRuby's IC), and 4-5x slower than invokedynamic was recently... HOWEVER...an assembly dump I just did contains some very odd code that might indicate a problem in my code. Hold the presses on digging into it on your end until I can explain this. (this = apparently standing up an instance variable table for Fixnum objects...*definitely* should not be happening). 0x02868f2a: mov [eax+0x1C], ebx ;*invokespecialinit ; - org.jruby.RubyBasicObject::init@1 (line 218) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16) 0x02868f2d: mov ebx, [esp+0x14] 0x02868f31: mov ecx, [ebx+0xBC] 0x02868f37: mov edx, eax 0x02868f39: shr edx, 9 0x02868f3c: mov [edx+0x394000], 0x00 ;*putfield varTable ; - org.jruby.RubyBasicObject::init@8 (line 84) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16) 0x02868f43: mov edx, a 'java/lang/Class' = 'org/jruby/RubyBasicObject' ; {oop(a 'java/lang/Class' = 'org/jruby/RubyBasicObject')} 0x02868f48: mov ebx, [edx+0x68] ;*getstatic NULL_OBJECT_ARRAY ; - org.jruby.RubyBasicObject::init@5 (line 84) ; - org.jruby.RubyObject::init@2 (line 116)
Re: Good news, bad news
On Mon, May 23, 2011 at 11:05 AM, Christian Thalinger christian.thalin...@oracle.com wrote: I was on vacation for a couple of days but now I'm back. I looked at the inlining tree for fib_recursive and I can't see the fib_ruby method getting compiled. But instead I see something I've never seen before: 2949 105 org.jruby.ast.CallOneArgNode::interpret (34 bytes) or 2997 108 ! org.jruby.evaluator.ASTInterpreter::INTERPRET_METHOD (126 bytes) Does that mean JRuby doesn't compile these methods but interpret them? Hmm, I wonder if there's a problem somewhere preventing JIT now. Run JRuby with -d -X+C to debug logging and force compilation. I will look into it today myself. I did see fib_ruby compile and was able to get assembly dumps out, but it performed terribly. I'll try to make extra sure everything's jitting. Ahh the joys of a mixed-mode runtime, eh? - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 23, 2011, at 11:45 AM, Charles Oliver Nutter wrote: I did see fib_ruby compile and was able to get assembly dumps out, but it performed terribly. I'll try to make extra sure everything's jitting. Ahh the joys of a mixed-mode runtime, eh? Indeed. Anyway, thanks for pushing on this. Sometimes you can get your dumps faster by using -XX:-DisplayVMOutput along with -XX:+PrintAssembly. The output goes to hotspot.log anyway. Also, -XX:CompileCommand=print,foo::bar will focus assembly printing on the method(s) you care about. -- John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 23, 2011, at 8:45 PM, Charles Oliver Nutter wrote: On Mon, May 23, 2011 at 11:05 AM, Christian Thalinger christian.thalin...@oracle.com wrote: I was on vacation for a couple of days but now I'm back. I looked at the inlining tree for fib_recursive and I can't see the fib_ruby method getting compiled. But instead I see something I've never seen before: 2949 105 org.jruby.ast.CallOneArgNode::interpret (34 bytes) or 2997 108 ! org.jruby.evaluator.ASTInterpreter::INTERPRET_METHOD (126 bytes) Does that mean JRuby doesn't compile these methods but interpret them? Hmm, I wonder if there's a problem somewhere preventing JIT now. Run JRuby with -d -X+C to debug logging and force compilation. I will look into it today myself. I didn't use -X+C but it should compile without it too, right? And, btw, I used JDK 7 b143. I try again tomorrow. -- Christian I did see fib_ruby compile and was able to get assembly dumps out, but it performed terribly. I'll try to make extra sure everything's jitting. Ahh the joys of a mixed-mode runtime, eh? - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 23, 2011, at 2:19 PM, Charles Oliver Nutter wrote: I'm working up a set of files that show JRuby compilation output, but I noticed a couple things that might be interesting right now. That looks promising. It seems something got bumped in our inlining logic. -- John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
fib_ruby LogCompilation inlining graph, showing that fib_ruby is not inlined: https://gist.github.com/f2b665ad3c97ba622ebf Can anyone suggest other flags I can try to adjust to get things to inline better? FWIW, the handle chain in question that's not inlining is pretty simple: * DMH pointing back at fib_ruby * permute args * GWT - Charlie On Mon, May 23, 2011 at 4:19 PM, Charles Oliver Nutter head...@headius.com wrote: I'm working up a set of files that show JRuby compilation output, but I noticed a couple things that might be interesting right now. First off, fairly early in the assembly output for fib, I see this: 0x02876d1f: call 0x0282d0e0 ; OopMap{[96]=Oop [100]=Oop [28]=Oop [40]=Oop [48]=Oop off=644} ;*invokespecial invokeExact ; - java.lang.invoke.MethodHandle::invokeExact@63 ; - java.lang.invoke.MethodHandle::invokeExact@23 ; - bench.bench_fib_recursive::method__0$RUBY$fib_ruby@51 (line 7) ; {optimized virtual_call} For fib, the only invokedynamic is the recursive call to fib, so that would indicate that fib_ruby is not inlining into itself at all here. And I can't see it inlining into itself anywhere in the assembly output. Later in the same output: 0x0287703f: call 0x0282dba0 ; OopMap{ebp=Oop off=1444} ;*checkcast ; - java.lang.invoke.MethodHandle::invokeExact@40 ; - bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7) ; {runtime_call} 0x02877044: call 0x0105a9d0 ;*checkcast ; - java.lang.invoke.MethodHandle::invokeExact@40 ; - bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7) ; {runtime_call} These appear repeatedly near the invokedynamic invocation above. If I'm reading this right, neither the recursive call nor logic involved in that particular handle is inlining. Am I right? Here's the complete assembly dump (i386) for the fib_ruby method: https://gist.github.com/987640 In other news, MaxInlineSize=150 with InlineSmallCode=3000 does not appear to improve performance. I also tried bumping up MaxRecursiveInlineLevel and MaxInlineLevel with no effect. - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
If there were to be a recursive inline in there, where would it occur? I can't tell from the names where in that inline tree where the recursive call occurs. tom On May 23, 2011, at 2:26 PM, Charles Oliver Nutter wrote: fib_ruby LogCompilation inlining graph, showing that fib_ruby is not inlined: https://gist.github.com/f2b665ad3c97ba622ebf Can anyone suggest other flags I can try to adjust to get things to inline better? FWIW, the handle chain in question that's not inlining is pretty simple: * DMH pointing back at fib_ruby * permute args * GWT - Charlie On Mon, May 23, 2011 at 4:19 PM, Charles Oliver Nutter head...@headius.com wrote: I'm working up a set of files that show JRuby compilation output, but I noticed a couple things that might be interesting right now. First off, fairly early in the assembly output for fib, I see this: 0x02876d1f: call 0x0282d0e0 ; OopMap{[96]=Oop [100]=Oop [28]=Oop [40]=Oop [48]=Oop off=644} ;*invokespecial invokeExact ; - java.lang.invoke.MethodHandle::invokeExact@63 ; - java.lang.invoke.MethodHandle::invokeExact@23 ; - bench.bench_fib_recursive::method__0$RUBY$fib_ruby@51 (line 7) ; {optimized virtual_call} For fib, the only invokedynamic is the recursive call to fib, so that would indicate that fib_ruby is not inlining into itself at all here. And I can't see it inlining into itself anywhere in the assembly output. Later in the same output: 0x0287703f: call 0x0282dba0 ; OopMap{ebp=Oop off=1444} ;*checkcast ; - java.lang.invoke.MethodHandle::invokeExact@40 ; - bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7) ; {runtime_call} 0x02877044: call 0x0105a9d0 ;*checkcast ; - java.lang.invoke.MethodHandle::invokeExact@40 ; - bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7) ; {runtime_call} These appear repeatedly near the invokedynamic invocation above. If I'm reading this right, neither the recursive call nor logic involved in that particular handle is inlining. Am I right? Here's the complete assembly dump (i386) for the fib_ruby method: https://gist.github.com/987640 In other news, MaxInlineSize=150 with InlineSmallCode=3000 does not appear to improve performance. I also tried bumping up MaxRecursiveInlineLevel and MaxInlineLevel with no effect. - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 23, 2011, at 2:26 PM, Charles Oliver Nutter wrote: * GWT GWT changed, but in a way that should make it easier (not harder) to inline. That could be the problem. The new format (using ricochet frames) is: GWT(p,t,f)(*a) := ( selectAlternative(p(*a), t, f) ).invokeExact(*a) where selectAlternative(z,a,b) := z?a:b Or: GWT(p,t,f) := foldArguments(tailcall, folder) where tailcall := exactInvoker(t.type()) chooser := selectAlternative.insertArguments(1,t,f) folder := filterReturnValue(p, chooser) The new MHW code should allow the foldArguments node to be inlined. But that may be the weakest link. -- John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
The following chunk should be the invokedynamic call to fib, via a GWT, an arg permuter, and perhaps one convert: @ 77 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 77 java.lang.invoke.MethodHandle::invokeExact (44 bytes) @ 8 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 8 java.lang.invoke.MethodHandle::invokeExact (7 bytes) @ 3 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::test (20 bytes) @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes) @ 8 org.jruby.RubyModule::getCacheToken (5 bytes) @ 23 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 23 java.lang.invoke.MethodHandle::invokeExact (67 bytes) @ 1 java.lang.Boolean::valueOf (14 bytes) @ 10 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 10 java.lang.invoke.MethodHandle::invokeExact (24 bytes) @ 11 java.lang.Boolean::booleanValue (5 bytes) @ 20 java.lang.invoke.MethodHandleImpl::selectAlternative (10 bytes) @ 63 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 37 sun.invoke.util.ValueConversions::identity (2 bytes) This seems to only be the test logic; the actual fib invocation doesn't appear to show up in the inlining graph at all. Am I right? I see two of these in the LogCompilation output and nothing else around them. I'd expect to see them do the invocation of fib_ruby somewhere in there. It's like the success branch of GWT is not even being considered for inlining. - Charlie On Mon, May 23, 2011 at 4:41 PM, Tom Rodriguez tom.rodrig...@oracle.com wrote: If there were to be a recursive inline in there, where would it occur? I can't tell from the names where in that inline tree where the recursive call occurs. tom On May 23, 2011, at 2:26 PM, Charles Oliver Nutter wrote: fib_ruby LogCompilation inlining graph, showing that fib_ruby is not inlined: https://gist.github.com/f2b665ad3c97ba622ebf Can anyone suggest other flags I can try to adjust to get things to inline better? FWIW, the handle chain in question that's not inlining is pretty simple: * DMH pointing back at fib_ruby * permute args * GWT - Charlie On Mon, May 23, 2011 at 4:19 PM, Charles Oliver Nutter head...@headius.com wrote: I'm working up a set of files that show JRuby compilation output, but I noticed a couple things that might be interesting right now. First off, fairly early in the assembly output for fib, I see this: 0x02876d1f: call 0x0282d0e0 ; OopMap{[96]=Oop [100]=Oop [28]=Oop [40]=Oop [48]=Oop off=644} ;*invokespecial invokeExact ; - java.lang.invoke.MethodHandle::invokeExact@63 ; - java.lang.invoke.MethodHandle::invokeExact@23 ; - bench.bench_fib_recursive::method__0$RUBY$fib_ruby@51 (line 7) ; {optimized virtual_call} For fib, the only invokedynamic is the recursive call to fib, so that would indicate that fib_ruby is not inlining into itself at all here. And I can't see it inlining into itself anywhere in the assembly output. Later in the same output: 0x0287703f: call 0x0282dba0 ; OopMap{ebp=Oop off=1444} ;*checkcast ; - java.lang.invoke.MethodHandle::invokeExact@40 ; - bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7) ; {runtime_call} 0x02877044: call 0x0105a9d0 ;*checkcast ; - java.lang.invoke.MethodHandle::invokeExact@40 ; - bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7) ; {runtime_call} These appear repeatedly near the invokedynamic invocation above. If I'm reading this right, neither the recursive call nor logic involved in that particular handle is inlining. Am I right? Here's the complete assembly dump (i386) for the fib_ruby method: https://gist.github.com/987640 In other news, MaxInlineSize=150 with InlineSmallCode=3000 does not appear to improve performance. I also tried bumping up MaxRecursiveInlineLevel and MaxInlineLevel with no effect. - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
Also, fwiw...after these two chunks in LogCompilation output, I see nothing else inlined into fib_ruby, including a monomorphic call path through PlusCallSite ending at RubyFixnum#op_plus (the integer + operation). That would also affect performance. I also do not see any indication *why* nothing inlines past this point. Usually it would say too big or something. I do see MinusCallSite inline earlier. - Charlie On Mon, May 23, 2011 at 4:47 PM, Charles Oliver Nutter head...@headius.com wrote: The following chunk should be the invokedynamic call to fib, via a GWT, an arg permuter, and perhaps one convert: @ 77 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 77 java.lang.invoke.MethodHandle::invokeExact (44 bytes) @ 8 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 8 java.lang.invoke.MethodHandle::invokeExact (7 bytes) @ 3 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::test (20 bytes) @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes) @ 8 org.jruby.RubyModule::getCacheToken (5 bytes) @ 23 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 23 java.lang.invoke.MethodHandle::invokeExact (67 bytes) @ 1 java.lang.Boolean::valueOf (14 bytes) @ 10 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 10 java.lang.invoke.MethodHandle::invokeExact (24 bytes) @ 11 java.lang.Boolean::booleanValue (5 bytes) @ 20 java.lang.invoke.MethodHandleImpl::selectAlternative (10 bytes) @ 63 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 37 sun.invoke.util.ValueConversions::identity (2 bytes) This seems to only be the test logic; the actual fib invocation doesn't appear to show up in the inlining graph at all. Am I right? I see two of these in the LogCompilation output and nothing else around them. I'd expect to see them do the invocation of fib_ruby somewhere in there. It's like the success branch of GWT is not even being considered for inlining. - Charlie On Mon, May 23, 2011 at 4:41 PM, Tom Rodriguez tom.rodrig...@oracle.com wrote: If there were to be a recursive inline in there, where would it occur? I can't tell from the names where in that inline tree where the recursive call occurs. tom On May 23, 2011, at 2:26 PM, Charles Oliver Nutter wrote: fib_ruby LogCompilation inlining graph, showing that fib_ruby is not inlined: https://gist.github.com/f2b665ad3c97ba622ebf Can anyone suggest other flags I can try to adjust to get things to inline better? FWIW, the handle chain in question that's not inlining is pretty simple: * DMH pointing back at fib_ruby * permute args * GWT - Charlie On Mon, May 23, 2011 at 4:19 PM, Charles Oliver Nutter head...@headius.com wrote: I'm working up a set of files that show JRuby compilation output, but I noticed a couple things that might be interesting right now. First off, fairly early in the assembly output for fib, I see this: 0x02876d1f: call 0x0282d0e0 ; OopMap{[96]=Oop [100]=Oop [28]=Oop [40]=Oop [48]=Oop off=644} ;*invokespecial invokeExact ; - java.lang.invoke.MethodHandle::invokeExact@63 ; - java.lang.invoke.MethodHandle::invokeExact@23 ; - bench.bench_fib_recursive::method__0$RUBY$fib_ruby@51 (line 7) ; {optimized virtual_call} For fib, the only invokedynamic is the recursive call to fib, so that would indicate that fib_ruby is not inlining into itself at all here. And I can't see it inlining into itself anywhere in the assembly output. Later in the same output: 0x0287703f: call 0x0282dba0 ; OopMap{ebp=Oop off=1444} ;*checkcast ; - java.lang.invoke.MethodHandle::invokeExact@40 ; - bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7) ; {runtime_call} 0x02877044: call 0x0105a9d0 ;*checkcast ; - java.lang.invoke.MethodHandle::invokeExact@40 ; - bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7) ; {runtime_call} These appear repeatedly near the invokedynamic invocation above. If I'm reading this right, neither the recursive call nor logic involved in that particular handle is inlining. Am I right? Here's the complete assembly dump (i386) for the fib_ruby method: https://gist.github.com/987640 In other news, MaxInlineSize=150 with InlineSmallCode=3000 does not appear to improve performance. I also tried bumping up MaxRecursiveInlineLevel and MaxInlineLevel with no effect. - Charlie
Re: Good news, bad news
Another example, running bench/language/bench_method_dispatch_only, which runs a 1m iteration loop that invokes an empty foo method five times: https://gist.github.com/9008f94fc677f3fe98e7 Note again that it seems like only the test logic and maybe some of the logic wrapping the foo call inline...the foo calls themselves do not appear in logc inlining graph at all. - Charlie On Mon, May 23, 2011 at 4:50 PM, Charles Oliver Nutter head...@headius.com wrote: Also, fwiw...after these two chunks in LogCompilation output, I see nothing else inlined into fib_ruby, including a monomorphic call path through PlusCallSite ending at RubyFixnum#op_plus (the integer + operation). That would also affect performance. I also do not see any indication *why* nothing inlines past this point. Usually it would say too big or something. I do see MinusCallSite inline earlier. - Charlie On Mon, May 23, 2011 at 4:47 PM, Charles Oliver Nutter head...@headius.com wrote: The following chunk should be the invokedynamic call to fib, via a GWT, an arg permuter, and perhaps one convert: @ 77 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 77 java.lang.invoke.MethodHandle::invokeExact (44 bytes) @ 8 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 8 java.lang.invoke.MethodHandle::invokeExact (7 bytes) @ 3 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::test (20 bytes) @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes) @ 8 org.jruby.RubyModule::getCacheToken (5 bytes) @ 23 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 23 java.lang.invoke.MethodHandle::invokeExact (67 bytes) @ 1 java.lang.Boolean::valueOf (14 bytes) @ 10 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 10 java.lang.invoke.MethodHandle::invokeExact (24 bytes) @ 11 java.lang.Boolean::booleanValue (5 bytes) @ 20 java.lang.invoke.MethodHandleImpl::selectAlternative (10 bytes) @ 63 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 37 sun.invoke.util.ValueConversions::identity (2 bytes) This seems to only be the test logic; the actual fib invocation doesn't appear to show up in the inlining graph at all. Am I right? I see two of these in the LogCompilation output and nothing else around them. I'd expect to see them do the invocation of fib_ruby somewhere in there. It's like the success branch of GWT is not even being considered for inlining. - Charlie On Mon, May 23, 2011 at 4:41 PM, Tom Rodriguez tom.rodrig...@oracle.com wrote: If there were to be a recursive inline in there, where would it occur? I can't tell from the names where in that inline tree where the recursive call occurs. tom On May 23, 2011, at 2:26 PM, Charles Oliver Nutter wrote: fib_ruby LogCompilation inlining graph, showing that fib_ruby is not inlined: https://gist.github.com/f2b665ad3c97ba622ebf Can anyone suggest other flags I can try to adjust to get things to inline better? FWIW, the handle chain in question that's not inlining is pretty simple: * DMH pointing back at fib_ruby * permute args * GWT - Charlie On Mon, May 23, 2011 at 4:19 PM, Charles Oliver Nutter head...@headius.com wrote: I'm working up a set of files that show JRuby compilation output, but I noticed a couple things that might be interesting right now. First off, fairly early in the assembly output for fib, I see this: 0x02876d1f: call 0x0282d0e0 ; OopMap{[96]=Oop [100]=Oop [28]=Oop [40]=Oop [48]=Oop off=644} ;*invokespecial invokeExact ; - java.lang.invoke.MethodHandle::invokeExact@63 ; - java.lang.invoke.MethodHandle::invokeExact@23 ; - bench.bench_fib_recursive::method__0$RUBY$fib_ruby@51 (line 7) ; {optimized virtual_call} For fib, the only invokedynamic is the recursive call to fib, so that would indicate that fib_ruby is not inlining into itself at all here. And I can't see it inlining into itself anywhere in the assembly output. Later in the same output: 0x0287703f: call 0x0282dba0 ; OopMap{ebp=Oop off=1444} ;*checkcast ; - java.lang.invoke.MethodHandle::invokeExact@40 ; - bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7) ; {runtime_call} 0x02877044: call 0x0105a9d0 ;*checkcast ; - java.lang.invoke.MethodHandle::invokeExact@40 ; - bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7) ; {runtime_call} These appear repeatedly near the invokedynamic invocation
Re: Good news, bad news
FWIW, perf with indy versus monomorphic inline caching on that bench_method_dispatch_only benchmark: ~/projects/jruby ➔ jruby --server -X+C bench/language/bench_method_dispatch_only.rbTest ruby method: 1000k loops calling self's foo 10 times 1.129000 0.00 1.129000 ( 0.662000) 0.409000 0.00 0.409000 ( 0.409000) 0.455000 0.00 0.455000 ( 0.455000) 0.428000 0.00 0.428000 ( 0.428000) 0.474000 0.00 0.474000 ( 0.474000) 0.47 0.00 0.47 ( 0.47) 0.458000 0.00 0.458000 ( 0.458000) 0.495000 0.00 0.495000 ( 0.495000) 0.46 0.00 0.46 ( 0.46) 0.508000 0.00 0.508000 ( 0.508000) ~/projects/jruby ➔ jruby --server -Xcompile.invokedynamic=false -X+C bench/language/bench_method_dispatch_only.rb Test ruby method: 1000k loops calling self's foo 10 times 0.377000 0.00 0.377000 ( 0.315000) 0.211000 0.00 0.211000 ( 0.207000) 0.132000 0.00 0.132000 ( 0.132000) 0.128000 0.00 0.128000 ( 0.128000) 0.135000 0.00 0.135000 ( 0.135000) 0.14 0.00 0.14 ( 0.14) 0.122000 0.00 0.122000 ( 0.122000) 0.122000 0.00 0.122000 ( 0.122000) 0.122000 0.00 0.122000 ( 0.122000) 0.122000 0.00 0.122000 ( 0.122000) Previously, invokedynamic version clocked in *much* faster than the MIC version...like an order of magnitude faster. - Charlie On Mon, May 23, 2011 at 4:56 PM, Charles Oliver Nutter head...@headius.com wrote: Another example, running bench/language/bench_method_dispatch_only, which runs a 1m iteration loop that invokes an empty foo method five times: https://gist.github.com/9008f94fc677f3fe98e7 Note again that it seems like only the test logic and maybe some of the logic wrapping the foo call inline...the foo calls themselves do not appear in logc inlining graph at all. - Charlie On Mon, May 23, 2011 at 4:50 PM, Charles Oliver Nutter head...@headius.com wrote: Also, fwiw...after these two chunks in LogCompilation output, I see nothing else inlined into fib_ruby, including a monomorphic call path through PlusCallSite ending at RubyFixnum#op_plus (the integer + operation). That would also affect performance. I also do not see any indication *why* nothing inlines past this point. Usually it would say too big or something. I do see MinusCallSite inline earlier. - Charlie On Mon, May 23, 2011 at 4:47 PM, Charles Oliver Nutter head...@headius.com wrote: The following chunk should be the invokedynamic call to fib, via a GWT, an arg permuter, and perhaps one convert: @ 77 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 77 java.lang.invoke.MethodHandle::invokeExact (44 bytes) @ 8 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 8 java.lang.invoke.MethodHandle::invokeExact (7 bytes) @ 3 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::test (20 bytes) @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes) @ 8 org.jruby.RubyModule::getCacheToken (5 bytes) @ 23 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 23 java.lang.invoke.MethodHandle::invokeExact (67 bytes) @ 1 java.lang.Boolean::valueOf (14 bytes) @ 10 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 10 java.lang.invoke.MethodHandle::invokeExact (24 bytes) @ 11 java.lang.Boolean::booleanValue (5 bytes) @ 20 java.lang.invoke.MethodHandleImpl::selectAlternative (10 bytes) @ 63 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 37 sun.invoke.util.ValueConversions::identity (2 bytes) This seems to only be the test logic; the actual fib invocation doesn't appear to show up in the inlining graph at all. Am I right? I see two of these in the LogCompilation output and nothing else around them. I'd expect to see them do the invocation of fib_ruby somewhere in there. It's like the success branch of GWT is not even being considered for inlining. - Charlie On Mon, May 23, 2011 at 4:41 PM, Tom Rodriguez tom.rodrig...@oracle.com wrote: If there were to be a recursive inline in there, where would it occur? I can't tell from the names where in that inline tree where the recursive call occurs. tom On May 23, 2011, at 2:26 PM, Charles Oliver Nutter wrote: fib_ruby LogCompilation inlining graph, showing that fib_ruby is not inlined: https://gist.github.com/f2b665ad3c97ba622ebf Can anyone suggest other flags I can try to adjust to get things to inline better? FWIW, the handle chain in question that's not inlining is pretty simple: * DMH pointing back at fib_ruby * permute args * GWT - Charlie On Mon, May 23, 2011 at 4:19 PM, Charles Oliver Nutter head...@headius.com wrote: I'm working up a set of files that show JRuby compilation output, but I noticed a couple things that might be interesting right now. First off, fairly early in the
Re: Good news, bad news
On May 23, 2011, at 2:50 PM, Charles Oliver Nutter wrote: Also, fwiw...after these two chunks in LogCompilation output, I see nothing else inlined into fib_ruby, including a monomorphic call path through PlusCallSite ending at RubyFixnum#op_plus (the integer + operation). That would also affect performance. I also do not see any indication *why* nothing inlines past this point. Usually it would say too big or something. If there's no message then it wasn't even a candidate for inlining, either because the path was trimmed away by the optimizer or there wasn't enough information to bind the callsite to a particular method. It's probably the latter in this case. Actually my guess is that the selectAlternative machinery is what's screwing it up. That code gets emitted as: (test ? a : b).invokeExact(); instead of if (test) a.invokeExact(); else b.invokeExact(); Since the first one only uses a single call site we can't statically inline even if a and b are constants. The optimizer could do that optimization but it currently doesn't. Can you build the guard differently? We'll need to build a new call site optimization to deal with the selectAlternative logic. tom I do see MinusCallSite inline earlier. - Charlie On Mon, May 23, 2011 at 4:47 PM, Charles Oliver Nutter head...@headius.com wrote: The following chunk should be the invokedynamic call to fib, via a GWT, an arg permuter, and perhaps one convert: @ 77 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 77 java.lang.invoke.MethodHandle::invokeExact (44 bytes) @ 8 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 8 java.lang.invoke.MethodHandle::invokeExact (7 bytes) @ 3 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::test (20 bytes) @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes) @ 8 org.jruby.RubyModule::getCacheToken (5 bytes) @ 23 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 23 java.lang.invoke.MethodHandle::invokeExact (67 bytes) @ 1 java.lang.Boolean::valueOf (14 bytes) @ 10 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 10 java.lang.invoke.MethodHandle::invokeExact (24 bytes) @ 11 java.lang.Boolean::booleanValue (5 bytes) @ 20 java.lang.invoke.MethodHandleImpl::selectAlternative (10 bytes) @ 63 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 37 sun.invoke.util.ValueConversions::identity (2 bytes) This seems to only be the test logic; the actual fib invocation doesn't appear to show up in the inlining graph at all. Am I right? I see two of these in the LogCompilation output and nothing else around them. I'd expect to see them do the invocation of fib_ruby somewhere in there. It's like the success branch of GWT is not even being considered for inlining. - Charlie On Mon, May 23, 2011 at 4:41 PM, Tom Rodriguez tom.rodrig...@oracle.com wrote: If there were to be a recursive inline in there, where would it occur? I can't tell from the names where in that inline tree where the recursive call occurs. tom On May 23, 2011, at 2:26 PM, Charles Oliver Nutter wrote: fib_ruby LogCompilation inlining graph, showing that fib_ruby is not inlined: https://gist.github.com/f2b665ad3c97ba622ebf Can anyone suggest other flags I can try to adjust to get things to inline better? FWIW, the handle chain in question that's not inlining is pretty simple: * DMH pointing back at fib_ruby * permute args * GWT - Charlie On Mon, May 23, 2011 at 4:19 PM, Charles Oliver Nutter head...@headius.com wrote: I'm working up a set of files that show JRuby compilation output, but I noticed a couple things that might be interesting right now. First off, fairly early in the assembly output for fib, I see this: 0x02876d1f: call 0x0282d0e0 ; OopMap{[96]=Oop [100]=Oop [28]=Oop [40]=Oop [48]=Oop off=644} ;*invokespecial invokeExact ; - java.lang.invoke.MethodHandle::invokeExact@63 ; - java.lang.invoke.MethodHandle::invokeExact@23 ; - bench.bench_fib_recursive::method__0$RUBY$fib_ruby@51 (line 7) ; {optimized virtual_call} For fib, the only invokedynamic is the recursive call to fib, so that would indicate that fib_ruby is not inlining into itself at all here. And I can't see it inlining into itself anywhere in the assembly output. Later in the same output: 0x0287703f: call 0x0282dba0 ; OopMap{ebp=Oop off=1444} ;*checkcast ; - java.lang.invoke.MethodHandle::invokeExact@40 ; - bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7) ;
Re: Good news, bad news
On Mon, May 23, 2011 at 5:20 PM, Tom Rodriguez tom.rodrig...@oracle.com wrote: On May 23, 2011, at 2:50 PM, Charles Oliver Nutter wrote: I also do not see any indication *why* nothing inlines past this point. Usually it would say too big or something. If there's no message then it wasn't even a candidate for inlining, either because the path was trimmed away by the optimizer or there wasn't enough information to bind the callsite to a particular method. It's probably the latter in this case. Good to know I'm reading this output right :) Is there another logging flag that will log these situations too? I'd like to have that not just for invokedynamic opto. Actually my guess is that the selectAlternative machinery is what's screwing it up. That code gets emitted as: (test ? a : b).invokeExact(); instead of if (test) a.invokeExact(); else b.invokeExact(); Since the first one only uses a single call site we can't statically inline even if a and b are constants. The optimizer could do that optimization but it currently doesn't. Can you build the guard differently? We'll need to build a new call site optimization to deal with the selectAlternative logic. Is this a question for me? If so... Build the guard differently as in don't use MethodHandles.guardWithTest? I probably could, but it would require inserting normal Java code into the call path, which should in theory be megamorphic (since all invokedynamic everywhere will go through that same piece of code). Something like: IRubyObject myGuardWithTest(MethodHandle dmh, args...) { if (test(args) { return dmh.invokeExact(args); } else { inline cache } } I'd *love* for intermediate static Java snippits like this to inline straight through, even if invokeExact would be megamorphic under normal circumstances...but I don't think that's the case right now, right? - Charlie tom I do see MinusCallSite inline earlier. - Charlie On Mon, May 23, 2011 at 4:47 PM, Charles Oliver Nutter head...@headius.com wrote: The following chunk should be the invokedynamic call to fib, via a GWT, an arg permuter, and perhaps one convert: @ 77 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 77 java.lang.invoke.MethodHandle::invokeExact (44 bytes) @ 8 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 8 java.lang.invoke.MethodHandle::invokeExact (7 bytes) @ 3 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::test (20 bytes) @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes) @ 8 org.jruby.RubyModule::getCacheToken (5 bytes) @ 23 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 23 java.lang.invoke.MethodHandle::invokeExact (67 bytes) @ 1 java.lang.Boolean::valueOf (14 bytes) @ 10 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 10 java.lang.invoke.MethodHandle::invokeExact (24 bytes) @ 11 java.lang.Boolean::booleanValue (5 bytes) @ 20 java.lang.invoke.MethodHandleImpl::selectAlternative (10 bytes) @ 63 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 37 sun.invoke.util.ValueConversions::identity (2 bytes) This seems to only be the test logic; the actual fib invocation doesn't appear to show up in the inlining graph at all. Am I right? I see two of these in the LogCompilation output and nothing else around them. I'd expect to see them do the invocation of fib_ruby somewhere in there. It's like the success branch of GWT is not even being considered for inlining. - Charlie On Mon, May 23, 2011 at 4:41 PM, Tom Rodriguez tom.rodrig...@oracle.com wrote: If there were to be a recursive inline in there, where would it occur? I can't tell from the names where in that inline tree where the recursive call occurs. tom On May 23, 2011, at 2:26 PM, Charles Oliver Nutter wrote: fib_ruby LogCompilation inlining graph, showing that fib_ruby is not inlined: https://gist.github.com/f2b665ad3c97ba622ebf Can anyone suggest other flags I can try to adjust to get things to inline better? FWIW, the handle chain in question that's not inlining is pretty simple: * DMH pointing back at fib_ruby * permute args * GWT - Charlie On Mon, May 23, 2011 at 4:19 PM, Charles Oliver Nutter head...@headius.com wrote: I'm working up a set of files that show JRuby compilation output, but I noticed a couple things that might be interesting right now. First off, fairly early in the assembly output for fib, I see this: 0x02876d1f: call 0x0282d0e0 ; OopMap{[96]=Oop [100]=Oop [28]=Oop [40]=Oop [48]=Oop off=644} ;*invokespecial invokeExact ; - java.lang.invoke.MethodHandle::invokeExact@63 ; - java.lang.invoke.MethodHandle::invokeExact@23 ; - bench.bench_fib_recursive::method__0$RUBY$fib_ruby@51
Re: Good news, bad news
On Mon, May 23, 2011 at 5:20 PM, Tom Rodriguez tom.rodrig...@oracle.com wrote: Actually my guess is that the selectAlternative machinery is what's screwing it up. That code gets emitted as: Same benchmark but instead of binding a direct handle via a GWT, I force it to fail over to an inline cache immediately (still behind invokedynamic): ~/projects/jruby ➔ jruby --server -X+C -Xinvokedynamic.maxfail=0 bench/language/bench_method_dispatch_only.rb Test ruby method: 1000k loops calling self's foo 10 times 0.515000 0.00 0.515000 ( 0.403000) 0.174000 0.00 0.174000 ( 0.174000) 0.136000 0.00 0.136000 ( 0.136000) 0.138000 0.00 0.138000 ( 0.138000) 0.164000 0.00 0.164000 ( 0.164000) 0.156000 0.00 0.156000 ( 0.157000) 0.15 0.00 0.15 ( 0.15) 0.145000 0.00 0.145000 ( 0.145000) 0.13 0.00 0.13 ( 0.13) 0.124000 0.00 0.124000 ( 0.124000) maxfail here indicates how many GWT it will attempt to chain for a polymorphic call site before bailing out to a simple inline caching target. That inline caching target is bound *directly* into the invokedynamic call site, with no GWT in front of it. And as expected, performance is basically the same as JRuby with inline caching only. So it seems GWT may be the culprit. Inlining graph for the above run, showing that the inline cache logic (fail methods in InvokeDynamicSupport in JRuby) all inline into the invokedynamic call site: 1% bench.language.bench_method_dispatch_only::method__1$RUBY$invoking @ 18 (177 bytes) @ 23 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 23 java.lang.invoke.MethodHandle::invokeExact (13 bytes) @ 6 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::fail (97 bytes) @ 2 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::pollAndGetClass (13 bytes) @ 1 org.jruby.runtime.ThreadContext::callThreadPoll (23 bytes) @ 19 org.jruby.runtime.ThreadContext::pollThreadEvents executed MinInliningThreshold times @ 5 org.jruby.runtime.builtin.IRubyObject::getMetaClass (0 bytes) type profile org/jruby/runtime/builtin/IRubyObject - org/jruby/RubyObject (100%) @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes) @ 17 org.jruby.runtime.callsite.CacheEntry::typeOk (6 bytes) @ 2 org.jruby.runtime.callsite.CacheEntry::typeOk (17 bytes) @ 5 org.jruby.RubyModule::getCacheToken (5 bytes) @ 34 org.jruby.internal.runtime.methods.DynamicMethod::call (13 bytes) type profile org/jruby/internal/runtime/methods/DynamicMethod - bench$language$bench_method_dispatch_only$method__0$RUBY$foo (100%) @ 34 bench$language$bench_method_dispatch_only$method__0$RUBY$foo::call (13 bytes) @ 9 bench$language$bench_method_dispatch_only$method__0$RUBY$foo::call (21 bytes) @ 17 bench.language.bench_method_dispatch_only::method__0$RUBY$foo (2 bytes) @ 42 org.jruby.RubyModule::searchWithCache call site not reached @ 50 org.jruby.runtime.invokedynamic.InvokeDynamicSupport$JRubyCallSite::callType (5 bytes) @ 56 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::methodMissing executed MinInliningThreshold times @ 65 org.jruby.runtime.invokedynamic.InvokeDynamicSupport$JRubyCallSite::callType (5 bytes) @ 72 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::callMethodMissing never executed @ 93 org.jruby.internal.runtime.methods.DynamicMethod::call (13 bytes) Note that foo is shown inlining here as well, but I think that's only because this particular inline-caching call path has only seen one target at this point. - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 23, 2011, at 3:27 PM, Charles Oliver Nutter wrote: On Mon, May 23, 2011 at 5:20 PM, Tom Rodriguez tom.rodrig...@oracle.com wrote: On May 23, 2011, at 2:50 PM, Charles Oliver Nutter wrote: I also do not see any indication *why* nothing inlines past this point. Usually it would say too big or something. If there's no message then it wasn't even a candidate for inlining, either because the path was trimmed away by the optimizer or there wasn't enough information to bind the callsite to a particular method. It's probably the latter in this case. Good to know I'm reading this output right :) Is there another logging flag that will log these situations too? I'd like to have that not just for invokedynamic opto. Actually my guess is that the selectAlternative machinery is what's screwing it up. That code gets emitted as: (test ? a : b).invokeExact(); instead of if (test) a.invokeExact(); else b.invokeExact(); Since the first one only uses a single call site we can't statically inline even if a and b are constants. The optimizer could do that optimization but it currently doesn't. Can you build the guard differently? We'll need to build a new call site optimization to deal with the selectAlternative logic. Is this a question for me? If so... Build the guard differently as in don't use MethodHandles.guardWithTest? I probably could, but it would require inserting normal Java code into the call path, which should in theory be megamorphic (since all invokedynamic everywhere will go through that same piece of code). Something like: IRubyObject myGuardWithTest(MethodHandle dmh, args...) { if (test(args) { return dmh.invokeExact(args); } else { inline cache } } I'd *love* for intermediate static Java snippits like this to inline straight through, even if invokeExact would be megamorphic under normal circumstances...but I don't think that's the case right now, right? I haven't been following 292 that closely but it used to be a piece of code that did precisely that. +private Object invoke_L0() throws Throwable { +if ((boolean) test.invokeExact()) +return target.invokeExact(); +return fallback.invokeExact(); +} It looks like it was reworked at some point to use selectAlternative but the optimizer was never updated to deal with it properly. It's not particularly hard, we just need to generate code like we would for a bimorphic call site. The current code expects to either get a constant or something else and doesn't inline if it's something else. In this case we have a Phi of two constants which we just need to split. I'm still unclear why you couldn't write your own variant of guardWithTest and have it work but my knowledge of what's really allowed is somewhat limited. tom - Charlie tom I do see MinusCallSite inline earlier. - Charlie On Mon, May 23, 2011 at 4:47 PM, Charles Oliver Nutter head...@headius.com wrote: The following chunk should be the invokedynamic call to fib, via a GWT, an arg permuter, and perhaps one convert: @ 77 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 77 java.lang.invoke.MethodHandle::invokeExact (44 bytes) @ 8 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 8 java.lang.invoke.MethodHandle::invokeExact (7 bytes) @ 3 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::test (20 bytes) @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes) @ 8 org.jruby.RubyModule::getCacheToken (5 bytes) @ 23 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 23 java.lang.invoke.MethodHandle::invokeExact (67 bytes) @ 1 java.lang.Boolean::valueOf (14 bytes) @ 10 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 10 java.lang.invoke.MethodHandle::invokeExact (24 bytes) @ 11 java.lang.Boolean::booleanValue (5 bytes) @ 20 java.lang.invoke.MethodHandleImpl::selectAlternative (10 bytes) @ 63 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 37 sun.invoke.util.ValueConversions::identity (2 bytes) This seems to only be the test logic; the actual fib invocation doesn't appear to show up in the inlining graph at all. Am I right? I see two of these in the LogCompilation output and nothing else around them. I'd expect to see them do the invocation of fib_ruby somewhere in there. It's like the success branch of GWT is not even being considered for inlining. - Charlie On Mon, May 23, 2011 at 4:41 PM, Tom Rodriguez tom.rodrig...@oracle.com wrote: If there were to be a recursive inline in there, where would it occur? I can't tell from the names where in that inline tree where the recursive call occurs. tom On May 23, 2011, at 2:26 PM, Charles Oliver Nutter wrote: fib_ruby LogCompilation inlining graph, showing that fib_ruby is not inlined:
Re: Good news, bad news
On 05/24/2011 12:44 AM, Tom Rodriguez wrote: On May 23, 2011, at 3:27 PM, Charles Oliver Nutter wrote: On Mon, May 23, 2011 at 5:20 PM, Tom Rodrigueztom.rodrig...@oracle.com wrote: On May 23, 2011, at 2:50 PM, Charles Oliver Nutter wrote: I also do not see any indication *why* nothing inlines past this point. Usually it would say too big or something. If there's no message then it wasn't even a candidate for inlining, either because the path was trimmed away by the optimizer or there wasn't enough information to bind the callsite to a particular method. It's probably the latter in this case. Good to know I'm reading this output right :) Is there another logging flag that will log these situations too? I'd like to have that not just for invokedynamic opto. Actually my guess is that the selectAlternative machinery is what's screwing it up. That code gets emitted as: (test ? a : b).invokeExact(); instead of if (test) a.invokeExact(); else b.invokeExact(); Since the first one only uses a single call site we can't statically inline even if a and b are constants. The optimizer could do that optimization but it currently doesn't. Can you build the guard differently? We'll need to build a new call site optimization to deal with the selectAlternative logic. Is this a question for me? If so... Build the guard differently as in don't use MethodHandles.guardWithTest? I probably could, but it would require inserting normal Java code into the call path, which should in theory be megamorphic (since all invokedynamic everywhere will go through that same piece of code). Something like: IRubyObject myGuardWithTest(MethodHandle dmh, args...) { if (test(args) { return dmh.invokeExact(args); } else { inline cache } } I'd *love* for intermediate static Java snippits like this to inline straight through, even if invokeExact would be megamorphic under normal circumstances...but I don't think that's the case right now, right? I haven't been following 292 that closely but it used to be a piece of code that did precisely that. +private Object invoke_L0() throws Throwable { +if ((boolean) test.invokeExact()) +return target.invokeExact(); +return fallback.invokeExact(); +} It looks like it was reworked at some point to use selectAlternative but the optimizer was never updated to deal with it properly. It's not particularly hard, we just need to generate code like we would for a bimorphic call site. The current code expects to either get a constant or something else and doesn't inline if it's something else. In this case we have a Phi of two constants which we just need to split. I'm still unclear why you couldn't write your own variant of guardWithTest and have it work but my knowledge of what's really allowed is somewhat limited. The main problem of writing your own variant is that you will have to write 256 * (number_of_primitive_type + 1) variants. The main idea of GWT (and all other method handle combiners) is that they provide a way to get such polymorphic variant without coding all combinations. The benefit is also that these combiners are known by the VM that can provide dedicated path that are more efficient, by ex, the arguments of GWT is pushed only once and used by the test *and* the target (or fallback) or better guarantee about how trees of combiners are inlined. tom Rémi ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 23, 2011, at 5:13 PM, Charles Oliver Nutter wrote: On Mon, May 23, 2011 at 6:17 PM, Rémi Forax fo...@univ-mlv.fr wrote: On 05/24/2011 12:44 AM, Tom Rodriguez wrote: I haven't been following 292 that closely but it used to be a piece of code that did precisely that. +private Object invoke_L0() throws Throwable { +if ((boolean) test.invokeExact()) +return target.invokeExact(); +return fallback.invokeExact(); +} It looks like it was reworked at some point to use selectAlternative but the optimizer was never updated to deal with it properly. It's not particularly hard, we just need to generate code like we would for a bimorphic call site. The current code expects to either get a constant or something else and doesn't inline if it's something else. In this case we have a Phi of two constants which we just need to split. I'll give it a shot. I'll put together a benchmark that produces effectively monomorphic calls at the invokedynamic call site, but which uses an intermediate hand-written piece of Java code (that ends up polymorphic) in place of GWT. Obviously GWT should not fail this badly in final release, but if this works it could mean many, many other patterns will too. And it will definitively answer the question about whether such a case inlines in current logic. You can also just wait for a fix. I've started writing the logic for it and just need to duplicate the guard. We should be able to have this fixed for JDK7. tom The main problem of writing your own variant is that you will have to write 256 * (number_of_primitive_type + 1) variants. The main idea of GWT (and all other method handle combiners) is that they provide a way to get such polymorphic variant without coding all combinations. The benefit is also that these combiners are known by the VM that can provide dedicated path that are more efficient, by ex, the arguments of GWT is pushed only once and used by the test *and* the target (or fallback) or better guarantee about how trees of combiners are inlined. I am not currently specializing any call paths for primitives, but that will come soon. For experiment's sake, I only need to provide one such GWT replacement to test fib (no block passed, single IRubyObject argument) and one more for the dispatch bench (no block, no args). We'll see how it looks. - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On May 23, 2011, at 3:44 PM, Tom Rodriguez wrote: I'd *love* for intermediate static Java snippits like this to inline straight through, even if invokeExact would be megamorphic under normal circumstances...but I don't think that's the case right now, right? I haven't been following 292 that closely but it used to be a piece of code that did precisely that. +private Object invoke_L0() throws Throwable { +if ((boolean) test.invokeExact()) +return target.invokeExact(); +return fallback.invokeExact(); +} It looks like it was reworked at some point to use selectAlternative but the optimizer was never updated to deal with it properly. It's not particularly hard, we just need to generate code like we would for a bimorphic call site. The current code expects to either get a constant or something else and doesn't inline if it's something else. In this case we have a Phi of two constants which we just need to split. Yes, it would be a quasi-bimorphic call site keyed from a phi of two constants, instead of a profile of two receiver classes. I'm still unclear why you couldn't write your own variant of guardWithTest and have it work but my knowledge of what's really allowed is somewhat limited. Those snippets will inline (I think), but at some point Charlie will want to fetch a bit of constant stuff out of an instance variable. That will look non-constant to the optimizer, even if the instance variable is final and the enclosing object is a constant reference. We made sure this happens for java.lang.invoke classes, but we haven't extended it yet to user code, in part because it would have its own bug tail to work through. -- John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
Well I have a prototype GWT in place, but I'm having trouble getting invokeExact to work. As far as I can tell, incoming arguments should match the handle type just fine, but I'm getting this error: InvokeDynamicSupport.java:934:in `gwt': java.lang.invoke.WrongMethodTypeException: (Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Ljava/lang/String;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject; cannot be called with a different arity as ([Ljava/lang/Object;)Ljava/lang/Object; The diff as it stands right now is here: https://gist.github.com/987939 - Charlie On Mon, May 23, 2011 at 7:23 PM, Tom Rodriguez tom.rodrig...@oracle.com wrote: On May 23, 2011, at 5:13 PM, Charles Oliver Nutter wrote: On Mon, May 23, 2011 at 6:17 PM, Rémi Forax fo...@univ-mlv.fr wrote: On 05/24/2011 12:44 AM, Tom Rodriguez wrote: I haven't been following 292 that closely but it used to be a piece of code that did precisely that. + private Object invoke_L0() throws Throwable { + if ((boolean) test.invokeExact()) + return target.invokeExact(); + return fallback.invokeExact(); + } It looks like it was reworked at some point to use selectAlternative but the optimizer was never updated to deal with it properly. It's not particularly hard, we just need to generate code like we would for a bimorphic call site. The current code expects to either get a constant or something else and doesn't inline if it's something else. In this case we have a Phi of two constants which we just need to split. I'll give it a shot. I'll put together a benchmark that produces effectively monomorphic calls at the invokedynamic call site, but which uses an intermediate hand-written piece of Java code (that ends up polymorphic) in place of GWT. Obviously GWT should not fail this badly in final release, but if this works it could mean many, many other patterns will too. And it will definitively answer the question about whether such a case inlines in current logic. You can also just wait for a fix. I've started writing the logic for it and just need to duplicate the guard. We should be able to have this fixed for JDK7. tom The main problem of writing your own variant is that you will have to write 256 * (number_of_primitive_type + 1) variants. The main idea of GWT (and all other method handle combiners) is that they provide a way to get such polymorphic variant without coding all combinations. The benefit is also that these combiners are known by the VM that can provide dedicated path that are more efficient, by ex, the arguments of GWT is pushed only once and used by the test *and* the target (or fallback) or better guarantee about how trees of combiners are inlined. I am not currently specializing any call paths for primitives, but that will come soon. For experiment's sake, I only need to provide one such GWT replacement to test fib (no block passed, single IRubyObject argument) and one more for the dispatch bench (no block, no args). We'll see how it looks. - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
Hi, I'm happy to see that the performance degradation is getting addressed. I would like to point out that there is still a serious crash in the machinery too... Have you seen any reason why that happens? Cheers On 2011-05-24 06.11, John Rose wrote: On May 23, 2011, at 3:44 PM, Tom Rodriguez wrote: I'd *love* for intermediate static Java snippits like this to inline straight through, even if invokeExact would be megamorphic under normal circumstances...but I don't think that's the case right now, right? I haven't been following 292 that closely but it used to be a piece of code that did precisely that. +private Object invoke_L0() throws Throwable { + if ((boolean) test.invokeExact()) +return target.invokeExact(); +return fallback.invokeExact(); + } It looks like it was reworked at some point to use selectAlternative but the optimizer was never updated to deal with it properly. It's not particularly hard, we just need to generate code like we would for a bimorphic call site. The current code expects to either get a constant or something else and doesn't inline if it's something else. In this case we have a Phi of two constants which we just need to split. Yes, it would be a quasi-bimorphic call site keyed from a phi of two constants, instead of a profile of two receiver classes. I'm still unclear why you couldn't write your own variant of guardWithTest and have it work but my knowledge of what's really allowed is somewhat limited. Those snippets will inline (I think), but at some point Charlie will want to fetch a bit of constant stuff out of an instance variable. That will look non-constant to the optimizer, even if the instance variable is final and the enclosing object is a constant reference. We made sure this happens for java.lang.invoke classes, but we haven't extended it yet to user code, in part because it would have its own bug tail to work through. -- John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev -- Ola Bini (http://olabini.com) Ioke - JRuby - ThoughtWorks Yields falsehood when quined yields falsehood when quined. ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On 2011-05-20 11.21, John Rose wrote: On May 19, 2011, at 10:06 PM, Charles Oliver Nutter wrote: I've confimed that the i386 build performs just as poorly. Have not investigate at inlining or assembly level yet...and it's late, so I'm going to get some sleep. It's likely that method handle inlining is failing. Try -XX:+UnlockDiag* -XX:-UseRicochetFrames for differential testing. If you can find a microbenchmark which shows differential slowdown, we can debug it. When trying the above all my tests fail with this: java.lang.RuntimeException: java.lang.InternalError: NYI at seph.lang.interpreter.MessageInterpreter.evaluateFully(MessageInterpreter.java:107) at seph.lang.Runtime.evaluateStream(Runtime.java:115) at seph.lang.Runtime.evaluateString(Runtime.java:127) at seph.lang.code.BasicSanityTest.recursive_odd_and_even_that_should_blow_the_stack(BasicSanityTest.java:215) Caused by: java.lang.InternalError: NYI at java.lang.invoke.AdapterMethodHandle.makeReturnConversion(AdapterMethodHandle.java:250) at java.lang.invoke.AdapterMethodHandle.makePairwiseConvert(AdapterMethodHandle.java:206) at java.lang.invoke.MethodHandleImpl.convertArguments(MethodHandleImpl.java:749) at java.lang.invoke.MethodHandleImpl$GuardWithTest.make(MethodHandleImpl.java:926) at java.lang.invoke.MethodHandleImpl.makeGuardWithTest(MethodHandleImpl.java:1066) at java.lang.invoke.MethodHandles.guardWithTest(MethodHandles.java:2072) at java.lang.invoke.SwitchPoint.guardWithTest(SwitchPoint.java:145) at seph.lang.compiler.Bootstrap.initialSetup_intrinsic_if(Bootstrap.java:1298) at seph$gen$abstraction$11.activateWith(Unknown Source) at seph$gen$abstraction$11.activateWith(Unknown Source) at seph.lang.ast.NamedMessage.go(NamedMessage.java:162) at seph.lang.ast.NamedMessage.sendTo(NamedMessage.java:143) at seph.lang.interpreter.MessageInterpreter.evaluateFully(MessageInterpreter.java:92) =( Cheers -- Ola Bini (http://olabini.com) Ioke - JRuby - ThoughtWorks Yields falsehood when quined yields falsehood when quined. ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
On Fri, May 20, 2011 at 1:07 AM, Ola Bini ola.b...@gmail.com wrote: Try -XX:+UnlockDiag* -XX:-UseRicochetFrames for differential testing. If you can find a microbenchmark which shows differential slowdown, we can debug it. When trying the above all my tests fail with this: java.lang.RuntimeException: java.lang.InternalError: NYI Same here. Reproducing it is pretty easy... * Clone JRuby: git clone https://github.com/jruby/jruby.git * Build JRuby: cd jruby ; ant * Run benchmarks: bin/jruby bench/bench_fib_recursive.rb 10 35 There are many benchmarks under the bench dir, almost all of which show this degradation. You can disable invokedynamic use in JRuby by passing -Xcompile.invokedynamic=false, as seen here: ~/projects/jruby ➔ jruby -Xcompile.invokedynamic=false bench/bench_fib_recursive.rb 5 35 WARNING: Both jruby-complete.jar and jruby.jar are present in the 'lib' directory. Will use jruby.jar 9227465 1.698000 0.00 1.698000 ( 1.645000) 9227465 1.352000 0.00 1.352000 ( 1.352000) 9227465 1.364000 0.00 1.364000 ( 1.364000) 9227465 1.344000 0.00 1.344000 ( 1.344000) 9227465 1.336000 0.00 1.336000 ( 1.336000) ~/projects/jruby ➔ jruby -Xcompile.invokedynamic=true bench/bench_fib_recursive.rb 5 35 WARNING: Both jruby-complete.jar and jruby.jar are present in the 'lib' directory. Will use jruby.jar 9227465 3.182000 0.00 3.182000 ( 3.12) 9227465 3.019000 0.00 3.019000 ( 3.019000) 9227465 3.195000 0.00 3.195000 ( 3.195000) 9227465 3.163000 0.00 3.163000 ( 3.163000) 9227465 3.656000 0.00 3.656000 ( 3.656000) I'm sure something's just broken...this benchmark was easily almost 2x faster than the non-invokedynamic version just a few days ago, and it's now 2-3x slower than non-indy logic. I'm going to be online all day, standing by to help in any way I can. I did do a quick LogCompilation last night and the main thing I noticed was that while it seemed like handles are inlining, the eventual DMH does not inline. That would explain some of the perf degradation, I think...but hopefully not all of it. Whatever you need me to run or whatever information...let me know. ALSO...I got the same error as Ola when running JRuby with a standalone jar file: ~/projects/jruby ➔ java -jar lib/jruby-complete.jar bench/bench_fib_recursive.rb MethodHandle.java:-1:in `invokeExact': java.lang.NoClassDefFoundError: org/jruby/runtime/ThreadContext from MethodHandle.java:-1:in `invokeExact' from bench/bench_fib_recursive.rb:7:in `method__0$RUBY$fib_ruby' from bench/bench_fib_recursive.rb:7:in `method__0$RUBY$fib_ruby' from bench/bench_fib_recursive.rb:7:in `method__0$RUBY$fib_ruby' from bench/bench_fib_recursive.rb:7:in `method__0$RUBY$fib_ruby' As in Ola's case, ThreadContext is a core JRuby class. I do *not* get this error when JRuby is loaded from the bootstrap classloader, so I imagine there's some mix-up with generated bytecode and the classloaders that that bytecode gets loaded into. - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Good news, bad news
Another update...I managed to turn off the ricochet frames and have it still work (???). The with-ricochet mode is faster, but still 2-3x slower than no indy at all (using JRuby's IC), and 4-5x slower than invokedynamic was recently... HOWEVER...an assembly dump I just did contains some very odd code that might indicate a problem in my code. Hold the presses on digging into it on your end until I can explain this. (this = apparently standing up an instance variable table for Fixnum objects...*definitely* should not be happening). 0x02868f2a: mov [eax+0x1C], ebx ;*invokespecial init ; - org.jruby.RubyBasicObject::init@1 (line 218) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16) 0x02868f2d: mov ebx, [esp+0x14] 0x02868f31: mov ecx, [ebx+0xBC] 0x02868f37: mov edx, eax 0x02868f39: shr edx, 9 0x02868f3c: mov [edx+0x394000], 0x00 ;*putfield varTable ; - org.jruby.RubyBasicObject::init@8 (line 84) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16) 0x02868f43: mov edx, a 'java/lang/Class' = 'org/jruby/RubyBasicObject' ; {oop(a 'java/lang/Class' = 'org/jruby/RubyBasicObject')} 0x02868f48: mov ebx, [edx+0x68] ;*getstatic NULL_OBJECT_ARRAY ; - org.jruby.RubyBasicObject::init@5 (line 84) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16) 0x02868f4b: mov [eax+0x10], ebx 0x02868f4e: lock add [esp], 0x00 ;*putfield varTable ; - org.jruby.RubyBasicObject::init@8 (line 84) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16) On Fri, May 20, 2011 at 8:10 AM, Charles Oliver Nutter head...@headius.com wrote: On Fri, May 20, 2011 at 1:07 AM, Ola Bini ola.b...@gmail.com wrote: Try -XX:+UnlockDiag* -XX:-UseRicochetFrames for differential testing. If you can find a microbenchmark which shows differential slowdown, we can debug it. When trying the above all my tests fail with this: java.lang.RuntimeException: java.lang.InternalError: NYI Same here. Reproducing it is pretty easy... * Clone JRuby: git clone https://github.com/jruby/jruby.git * Build JRuby: cd jruby ; ant * Run benchmarks: bin/jruby bench/bench_fib_recursive.rb 10 35 There are many benchmarks under the bench dir, almost all of which show this degradation. You can disable invokedynamic use in JRuby by passing -Xcompile.invokedynamic=false, as seen here: ~/projects/jruby ➔ jruby -Xcompile.invokedynamic=false bench/bench_fib_recursive.rb 5 35 WARNING: Both jruby-complete.jar and jruby.jar are present in the 'lib' directory. Will use
Re: Good news, bad news
Meh, false alarm. This is just the default empty Object[] assigned to all objects. I still can't explain the perf degradation. Here's the full assembly for fib_ruby: https://gist.github.com/983721 It seems like stuff is inlining, so I have no idea why performance is so terrible. - Charlie On Fri, May 20, 2011 at 2:59 PM, Charles Oliver Nutter head...@headius.com wrote: Another update...I managed to turn off the ricochet frames and have it still work (???). The with-ricochet mode is faster, but still 2-3x slower than no indy at all (using JRuby's IC), and 4-5x slower than invokedynamic was recently... HOWEVER...an assembly dump I just did contains some very odd code that might indicate a problem in my code. Hold the presses on digging into it on your end until I can explain this. (this = apparently standing up an instance variable table for Fixnum objects...*definitely* should not be happening). 0x02868f2a: mov [eax+0x1C], ebx ;*invokespecial init ; - org.jruby.RubyBasicObject::init@1 (line 218) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16) 0x02868f2d: mov ebx, [esp+0x14] 0x02868f31: mov ecx, [ebx+0xBC] 0x02868f37: mov edx, eax 0x02868f39: shr edx, 9 0x02868f3c: mov [edx+0x394000], 0x00 ;*putfield varTable ; - org.jruby.RubyBasicObject::init@8 (line 84) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16) 0x02868f43: mov edx, a 'java/lang/Class' = 'org/jruby/RubyBasicObject' ; {oop(a 'java/lang/Class' = 'org/jruby/RubyBasicObject')} 0x02868f48: mov ebx, [edx+0x68] ;*getstatic NULL_OBJECT_ARRAY ; - org.jruby.RubyBasicObject::init@5 (line 84) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16) 0x02868f4b: mov [eax+0x10], ebx 0x02868f4e: lock add [esp], 0x00 ;*putfield varTable ; - org.jruby.RubyBasicObject::init@8 (line 84) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16) On Fri, May 20, 2011 at 8:10 AM, Charles Oliver Nutter head...@headius.com wrote: On Fri, May 20, 2011 at 1:07 AM, Ola Bini ola.b...@gmail.com wrote: Try -XX:+UnlockDiag* -XX:-UseRicochetFrames for differential testing. If you can find a microbenchmark which shows differential slowdown, we can debug it. When trying the above all my tests fail with this: java.lang.RuntimeException: java.lang.InternalError: NYI Same here. Reproducing it is pretty easy... * Clone JRuby: git clone https://github.com/jruby/jruby.git * Build JRuby: cd jruby ; ant * Run benchmarks:
Re: Good news, bad news
Is it the inlining policy? I played a little with this and the current default policy is still not as good as the big tweaks to MaxInlineSize and InlineSmallCode. tom On May 20, 2011, at 1:21 PM, Charles Oliver Nutter wrote: Meh, false alarm. This is just the default empty Object[] assigned to all objects. I still can't explain the perf degradation. Here's the full assembly for fib_ruby: https://gist.github.com/983721 It seems like stuff is inlining, so I have no idea why performance is so terrible. - Charlie On Fri, May 20, 2011 at 2:59 PM, Charles Oliver Nutter head...@headius.com wrote: Another update...I managed to turn off the ricochet frames and have it still work (???). The with-ricochet mode is faster, but still 2-3x slower than no indy at all (using JRuby's IC), and 4-5x slower than invokedynamic was recently... HOWEVER...an assembly dump I just did contains some very odd code that might indicate a problem in my code. Hold the presses on digging into it on your end until I can explain this. (this = apparently standing up an instance variable table for Fixnum objects...*definitely* should not be happening). 0x02868f2a: mov [eax+0x1C], ebx ;*invokespecial init ; - org.jruby.RubyBasicObject::init@1 (line 218) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16) 0x02868f2d: mov ebx, [esp+0x14] 0x02868f31: mov ecx, [ebx+0xBC] 0x02868f37: mov edx, eax 0x02868f39: shr edx, 9 0x02868f3c: mov [edx+0x394000], 0x00 ;*putfield varTable ; - org.jruby.RubyBasicObject::init@8 (line 84) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16) 0x02868f43: mov edx, a 'java/lang/Class' = 'org/jruby/RubyBasicObject' ; {oop(a 'java/lang/Class' = 'org/jruby/RubyBasicObject')} 0x02868f48: mov ebx, [edx+0x68] ;*getstatic NULL_OBJECT_ARRAY ; - org.jruby.RubyBasicObject::init@5 (line 84) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16) 0x02868f4b: mov [eax+0x10], ebx 0x02868f4e: lock add [esp], 0x00 ;*putfield varTable ; - org.jruby.RubyBasicObject::init@8 (line 84) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16) On Fri, May 20, 2011 at 8:10 AM, Charles Oliver Nutter head...@headius.com wrote: On Fri, May 20, 2011 at 1:07 AM, Ola Bini ola.b...@gmail.com wrote: Try -XX:+UnlockDiag* -XX:-UseRicochetFrames for differential testing. If you can find a microbenchmark which shows differential slowdown, we can debug it. When trying the above all my
Re: Good news, bad news
On 05/20/2011 11:08 PM, Tom Rodriguez wrote: Is it the inlining policy? I played a little with this and the current default policy is still not as good as the big tweaks to MaxInlineSize and InlineSmallCode. tom I tend to agree, things like isFixnumReopened and getCacheToken seem to be not inlined. Charles, do you have tested if the inline flags are tweaked ? Also I do not understand why you have JRuby's callsites like MinusCallSite in the stacktrace. You should not use it if invokedynamic is enabled. Rémi On May 20, 2011, at 1:21 PM, Charles Oliver Nutter wrote: Meh, false alarm. This is just the default empty Object[] assigned to all objects. I still can't explain the perf degradation. Here's the full assembly for fib_ruby: https://gist.github.com/983721 It seems like stuff is inlining, so I have no idea why performance is so terrible. - Charlie On Fri, May 20, 2011 at 2:59 PM, Charles Oliver Nutter head...@headius.com wrote: Another update...I managed to turn off the ricochet frames and have it still work (???). The with-ricochet mode is faster, but still 2-3x slower than no indy at all (using JRuby's IC), and 4-5x slower than invokedynamic was recently... HOWEVER...an assembly dump I just did contains some very odd code that might indicate a problem in my code. Hold the presses on digging into it on your end until I can explain this. (this = apparently standing up an instance variable table for Fixnum objects...*definitely* should not be happening). 0x02868f2a: mov [eax+0x1C], ebx ;*invokespecialinit ; - org.jruby.RubyBasicObject::init@1 (line 218) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16) 0x02868f2d: mov ebx, [esp+0x14] 0x02868f31: mov ecx, [ebx+0xBC] 0x02868f37: mov edx, eax 0x02868f39: shr edx, 9 0x02868f3c: mov [edx+0x394000], 0x00 ;*putfield varTable ; - org.jruby.RubyBasicObject::init@8 (line 84) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16) 0x02868f43: mov edx, a 'java/lang/Class' = 'org/jruby/RubyBasicObject' ; {oop(a 'java/lang/Class' = 'org/jruby/RubyBasicObject')} 0x02868f48: mov ebx, [edx+0x68] ;*getstatic NULL_OBJECT_ARRAY ; - org.jruby.RubyBasicObject::init@5 (line 84) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16) 0x02868f4b: mov [eax+0x10], ebx 0x02868f4e: lock add [esp], 0x00 ;*putfield varTable ; - org.jruby.RubyBasicObject::init@8 (line 84) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; -
Re: Good news, bad news
I am still using the specialized math callsites because they inline well and an Indy-based approach was still quite a bit slower last time I tried. I am back to being worried about the inlining thresholds. With no MHs and runtime profiled direct calls, I was able to get this all to inline. So basically the fact that I am using Indy and MHs now penalizes me over not just profiled direct calls, but over a totally generic inline cache that usually doesn't inline at all. That's pretty bad :) I can try fiddling with the thresholds later, but do we agree that an invokedynamic bound to a GWT plus some intermediate adapter handles ending in a direct handle should inline as well as e.g. an invokevirtual? - Charlie (mobile) On May 20, 2011, at 17:35, Rémi Forax fo...@univ-mlv.fr wrote: On 05/20/2011 11:08 PM, Tom Rodriguez wrote: Is it the inlining policy? I played a little with this and the current default policy is still not as good as the big tweaks to MaxInlineSize and InlineSmallCode. tom I tend to agree, things like isFixnumReopened and getCacheToken seem to be not inlined. Charles, do you have tested if the inline flags are tweaked ? Also I do not understand why you have JRuby's callsites like MinusCallSite in the stacktrace. You should not use it if invokedynamic is enabled. Rémi On May 20, 2011, at 1:21 PM, Charles Oliver Nutter wrote: Meh, false alarm. This is just the default empty Object[] assigned to all objects. I still can't explain the perf degradation. Here's the full assembly for fib_ruby: https://gist.github.com/983721 It seems like stuff is inlining, so I have no idea why performance is so terrible. - Charlie On Fri, May 20, 2011 at 2:59 PM, Charles Oliver Nutter head...@headius.com wrote: Another update...I managed to turn off the ricochet frames and have it still work (???). The with-ricochet mode is faster, but still 2-3x slower than no indy at all (using JRuby's IC), and 4-5x slower than invokedynamic was recently... HOWEVER...an assembly dump I just did contains some very odd code that might indicate a problem in my code. Hold the presses on digging into it on your end until I can explain this. (this = apparently standing up an instance variable table for Fixnum objects...*definitely* should not be happening). 0x02868f2a: mov [eax+0x1C], ebx ;*invokespecialinit ; - org.jruby.RubyBasicObject::init@1 (line 218) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16) 0x02868f2d: mov ebx, [esp+0x14] 0x02868f31: mov ecx, [ebx+0xBC] 0x02868f37: mov edx, eax 0x02868f39: shr edx, 9 0x02868f3c: mov [edx+0x394000], 0x00 ;*putfield varTable ; - org.jruby.RubyBasicObject::init@8 (line 84) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16) 0x02868f43: mov edx, a 'java/lang/Class' = 'org/jruby/RubyBasicObject' ; {oop(a 'java/lang/Class' = 'org/jruby/RubyBasicObject')} 0x02868f48: mov ebx, [edx+0x68] ;*getstatic NULL_OBJECT_ARRAY ; - org.jruby.RubyBasicObject::init@5 (line 84) ; - org.jruby.RubyObject::init@2 (line 116) ; - org.jruby.RubyNumeric::init@2 (line 114) ; - org.jruby.RubyInteger::init@2 (line 95) ; - org.jruby.RubyFixnum::init@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) ; - org.jruby.RubyFixnum::op_minus@34 (line 395) ; - org.jruby.runtime.callsite.MinusCallSite::call@24
Re: Good news, bad news
On May 19, 2011, at 10:06 PM, Charles Oliver Nutter wrote: I've confimed that the i386 build performs just as poorly. Have not investigate at inlining or assembly level yet...and it's late, so I'm going to get some sleep. It's likely that method handle inlining is failing. Try -XX:+UnlockDiag* -XX:-UseRicochetFrames for differential testing. If you can find a microbenchmark which shows differential slowdown, we can debug it. -- John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev