Re: More performance explorations

2011-06-14 Thread John Rose
On Jun 13, 2011, at 8:37 AM, Ola Bini wrote: On 2011-06-13 10.14, Ola Bini wrote: I did a rebuild of MLVM+bsdport today, and I now see the missing class crash again on my Mac too (to clarify, this was gone from my Mac for a while, but came back. This means it's consistent with my JDK7 build

Re: More performance explorations

2011-06-14 Thread Ola Bini
Hi, Your analysis of the behavior is correct. However, what makes me suspect this is 292s fault is partly the -Xint part, partly the fact that this didn't use to blow up - and I haven't changed the compilation strategy for that specific test for quite some time. It also seems weird that it works

Re: More performance explorations

2011-06-14 Thread Ola Bini
Hi, In the latest push to the github repository I've added the smallest compilation I can get to exhibit the problem - it contains only one invokedynamic ( it compiles a method that does 1 + 1 ). You can run it specifically (and only) using: $JAVA_HOME/bin/java -Xbatch -XX:+PrintCompilation -cp

Re: More performance explorations

2011-06-13 Thread John Rose
On Jun 11, 2011, at 1:31 PM, Ola Bini wrote: Asking here too: is that still happening? -- Christian I haven't seen any commits that is likely to fix it, but I will rebuild and take a look today. The commits for 7047697 and 7052202 both fix crashes. -- John

Re: More performance explorations

2011-06-13 Thread Ola Bini
On 2011-06-13 03.44, John Rose wrote: On Jun 11, 2011, at 1:31 PM, Ola Bini wrote: Asking here too: is that still happening? -- Christian I haven't seen any commits that is likely to fix it, but I will rebuild and take a look today. The commits for 7047697 and 7052202 both fix

Re: More performance explorations

2011-06-13 Thread Ola Bini
On 2011-06-13 10.14, Ola Bini wrote: On 2011-06-13 03.44, John Rose wrote: On Jun 11, 2011, at 1:31 PM, Ola Bini wrote: Asking here too: is that still happening? -- Christian I haven't seen any commits that is likely to fix it, but I will rebuild and take a look today. The commits for

Re: More performance explorations

2011-06-10 Thread Christian Thalinger
On Jun 4, 2011, at 5:19 AM, Ola Bini wrote: On 2011-06-04 01.47, John Rose wrote: On Jun 3, 2011, at 7:07 AM, Ola Bini wrote: Is there anything I can do to help out with finding this problem? I can't reproduce the VM crash yet. Like Christian, I got through problems 1 and 2, on bsd (both

Re: More performance explorations

2011-06-10 Thread Christian Thalinger
On Jun 5, 2011, at 7:23 AM, Charles Oliver Nutter wrote: OH, and FWIW, here's the LogCompilation -i output roughly around where I'd expect to see op_plus and op_lt inlining: @ 27 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 27 java.lang.invoke.MethodHandle::invokeExact (17

Re: More performance explorations

2011-06-10 Thread Christian Thalinger
On Jun 5, 2011, at 7:23 AM, Charles Oliver Nutter wrote: OH, and FWIW, here's the LogCompilation -i output roughly around where I'd expect to see op_plus and op_lt inlining: @ 27 java.lang.invoke.MethodHandle::invokeExact (0 bytes) @ 27 java.lang.invoke.MethodHandle::invokeExact (17

Re: More performance explorations

2011-06-10 Thread Charles Oliver Nutter
On Fri, Jun 10, 2011 at 5:32 PM, Christian Thalinger christian.thalin...@oracle.com wrote: It inlines fine with the latest HotSpot and JDK7 b145.  I think we're good :-) Sorry, the logic for this never actually landed on JRuby master, and I lost the commit somehow...so the inlining you're seeing

Re: More performance explorations

2011-06-10 Thread Christian Thalinger
On Jun 10, 2011, at 3:37 PM, Charles Oliver Nutter wrote: On Fri, Jun 10, 2011 at 5:32 PM, Christian Thalinger christian.thalin...@oracle.com wrote: It inlines fine with the latest HotSpot and JDK7 b145. I think we're good :-) Sorry, the logic for this never actually landed on JRuby

Re: More performance explorations

2011-06-05 Thread Charles Oliver Nutter
On Fri, Jun 3, 2011 at 1:01 PM, Tom Rodriguez tom.rodrig...@oracle.com wrote: On Jun 3, 2011, at 12:12 AM, Charles Oliver Nutter wrote: I did make another small discovery: the + calls never inline completely. They eventually are a virtual invocation of RubyFixnum.op_plus, and in both old and

Re: More performance explorations

2011-06-05 Thread Charles Oliver Nutter
On Sat, Jun 4, 2011 at 2:05 AM, John Rose john.r.r...@oracle.com wrote: One big answer is that pre-RF code was building such things routinely, in order to normalize signatures down to a few equivalence classes (arity only).  But post-RF code doesn't need to do that.  I found a few places in

Re: More performance explorations

2011-06-05 Thread Rémi Forax
On 06/05/2011 10:41 AM, Charles Oliver Nutter wrote: On Fri, Jun 3, 2011 at 1:01 PM, Tom Rodrigueztom.rodrig...@oracle.com wrote: On Jun 3, 2011, at 12:12 AM, Charles Oliver Nutter wrote: I did make another small discovery: the + calls never inline completely. They eventually are a virtual

Re: More performance explorations

2011-06-05 Thread Charles Oliver Nutter
On Sun, Jun 5, 2011 at 6:06 AM, Rémi Forax fo...@univ-mlv.fr wrote: 7 already performs well. But yes, compared to a top notch JVM based dynamic language runtime, 7 has some performance glitches. But please don't forget that implementing a runtime with invokedynamic is a way easier that to

Re: More performance explorations

2011-06-05 Thread Charles Oliver Nutter
On Sun, Jun 5, 2011 at 6:18 AM, Charles Oliver Nutter head...@headius.com wrote: That said...I have not recently re-attempted installing invokedynamic-based primitive call paths. I'll give it another shot this week and see where we stand. Testing a simple loop ought to show quickly the

Re: More performance explorations

2011-06-04 Thread John Rose
On Jun 3, 2011, at 7:07 AM, Ola Bini wrote: Is there anything I can do to help out with finding this problem? I can't reproduce the VM crash yet. Like Christian, I got through problems 1 and 2, on bsd (both 32-bit and 64-bit). Problem 3 generates huge output, so it's hard to see what's going

Re: More performance explorations

2011-06-04 Thread John Rose
On Jun 3, 2011, at 11:47 PM, John Rose wrote: On Jun 3, 2011, at 7:07 AM, Ola Bini wrote: Is there anything I can do to help out with finding this problem? I can't reproduce the VM crash yet. Like Christian, I got through problems 1 and 2, on bsd (both 32-bit and 64-bit). Problem 3

Re: More performance explorations

2011-06-04 Thread John Rose
On Jun 3, 2011, at 4:15 PM, Tom Rodriguez wrote: On Jun 2, 2011, at 7:37 PM, John Rose wrote: Thanks; I'll look at your dump later tonight. If the problem is friction from interface casts, we can probably remove them. It's hard to figure out how they are getting in, though. It happens

Re: More performance explorations

2011-06-04 Thread Ola Bini
On 2011-06-04 01.47, John Rose wrote: On Jun 3, 2011, at 7:07 AM, Ola Bini wrote: Is there anything I can do to help out with finding this problem? I can't reproduce the VM crash yet. Like Christian, I got through problems 1 and 2, on bsd (both 32-bit and 64-bit). Problem 3 generates

Re: More performance explorations

2011-06-03 Thread Charles Oliver Nutter
On Thu, Jun 2, 2011 at 9:37 PM, John Rose john.r.r...@oracle.com wrote: Thanks; I'll look at your dump later tonight. If the problem is friction from interface casts, we can probably remove them.  It's hard to figure out how they are getting in, though.  It happens when IRubyObject

Re: More performance explorations

2011-06-03 Thread Ola Bini
On 2011-06-02 12.53, Ola Bini wrote: On 2011-06-02 11.59, John Rose wrote: I was hoping your crash would go away with today's patches. I'll look into it. I did a build after today's patches. Still same problem: java.lang.NoClassDefFoundError: seph/lang/SephObject [junit]

Re: More performance explorations

2011-06-03 Thread Rémi Forax
On 06/03/2011 08:01 PM, Tom Rodriguez wrote: On Jun 3, 2011, at 12:12 AM, Charles Oliver Nutter wrote: I did make another small discovery: the + calls never inline completely. They eventually are a virtual invocation of RubyFixnum.op_plus, and in both old and new builds they're calls in the

Re: More performance explorations

2011-06-03 Thread John Rose
On Jun 3, 2011, at 11:58 AM, Rémi Forax wrote: This means that you need one MDO by method handle (which do a call) by callsite associated to an invokedynamic. As far as I know, hotspot doesn't do that. You are right. There are paths like that which escape call-site profiling. There's no

Re: More performance explorations

2011-06-03 Thread Rémi Forax
On 06/03/2011 09:08 PM, John Rose wrote: On Jun 3, 2011, at 11:58 AM, Rémi Forax wrote: This means that you need one MDO by method handle (which do a call) by callsite associated to an invokedynamic. As far as I know, hotspot doesn't do that. You are right. There are paths like that which

Re: More performance explorations

2011-06-03 Thread Tom Rodriguez
On Jun 2, 2011, at 7:37 PM, John Rose wrote: Thanks; I'll look at your dump later tonight. If the problem is friction from interface casts, we can probably remove them. It's hard to figure out how they are getting in, though. It happens when IRubyObject interconverts with Object. So I

Re: More performance explorations

2011-06-02 Thread Charles Oliver Nutter
I tentatively admit guilt. I was still running off the 5/26 build, which still had convertArguments and probably didn't have all the recent optimizations for ricochet. I'm doing an updated build now and will report back shortly. On Thu, Jun 2, 2011 at 10:13 AM, Charles Oliver Nutter

Re: More performance explorations

2011-06-02 Thread Charles Oliver Nutter
Ok, the ricochet flag performs in the expected range...but still slower than the old logic, and also slower than the 5/13 macosx build. ~/projects/jruby ➔ jruby -X+C -J-Djava.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=false --server bench/bench_fib_recursive.rb 5 35 9227465 1.354000 0.00

Re: More performance explorations

2011-06-02 Thread John Rose
On Jun 2, 2011, at 9:01 AM, Charles Oliver Nutter wrote: Ok, the ricochet flag performs in the expected range...but still slower than the old logic, and also slower than the 5/13 macosx build. You mean java -Djava.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=true? If so, let's look at the code

Re: More performance explorations

2011-06-02 Thread John Rose
Optimizations can go into any release. -- John (on my iPhone) On Jun 2, 2011, at 9:43 AM, Thomas E Enebo tom.en...@gmail.com wrote: Out of curiousity...How tough is it to get new optimizations into u{1-n} releases? ___ mlvm-dev mailing list

Re: More performance explorations

2011-06-02 Thread John Rose
Thanks; I'll look at your dump later tonight. If the problem is friction from interface casts, we can probably remove them. It's hard to figure out how they are getting in, though. It happens when IRubyObject interconverts with Object. Are you doing it, or is it coming from inside the

Re: More performance explorations

2011-06-01 Thread Tom Rodriguez
I'm seeing slow perf as well but I think it might be a jruby/292 mismatch problem: Exception a 'java/lang/NoSuchMethodError': java.lang.invoke.MethodHandles.convertArguments(Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodT\ ype;)Ljava/lang/invoke/MethodHandle; (0xefdd4b18 ) thrown

Re: More performance explorations

2011-06-01 Thread John Rose
On Jun 1, 2011, at 1:37 PM, Tom Rodriguez wrote: I'm seeing slow perf as well but I think it might be a jruby/292 mismatch problem: Yes; the 292 expert group yanked convertArgs since it was a near-synonym of asType. (The delta is factored into the new method asFixedArity.) The mlvm patch

Re: More performance explorations

2011-06-01 Thread John Rose
On Jun 1, 2011, at 6:11 PM, John Rose wrote: I'll spin a meth.jar that includes (a) recent fixes and (b) deprecated methods. (Actually, two meth.jar's.) http://cr.openjdk.java.net/~jrose/pres/indy-javadoc-mlvm/meth.jar

Re: More performance explorations

2011-05-31 Thread Charles Oliver Nutter
If you mean java.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=true, then I have more bad news...it's even slower :( Still using 5/26 build. I will try to update later... ~/projects/jruby ➔ jruby -J-Djava.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=false --server bench/bench_fib_recursive.rb 5 35 9227465

Re: More performance explorations

2011-05-31 Thread John Rose
On May 31, 2011, at 9:36 AM, Charles Oliver Nutter wrote: If you mean java.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=true, then I have more bad news...it's even slower :( Is that with meth-bim.patch applied? That is Tom's first cut at an optimization for RFs. -- John

Re: More performance explorations

2011-05-28 Thread Charles Oliver Nutter
On Fri, May 27, 2011 at 7:46 PM, John Rose john.r.r...@oracle.com wrote: The cure is simple:  Get rid of the interface casts implied by your uses of asType.  To do this, use MHs.explicitCastArguments, which does the more performance-friendly no-op retyping of interfaces.  If you need to cast

Re: More performance explorations

2011-05-28 Thread Charles Oliver Nutter
On Sat, May 28, 2011 at 6:49 PM, Charles Oliver Nutter head...@headius.com wrote: I'm double-checking everything now and will get back to you, but I'm not sure how the logic asType logic could explain the extra stuff... There's also the + call being dispatched through invokedynamic, and it

Re: More performance explorations

2011-05-27 Thread Charles Oliver Nutter
Two threads going now, so I'm going to move exploration and discussion to this one. So, I'm reading through MLVM versus 5/13 macosx build amd64 ASM output... First difference, the operator logic: On macosx build: 4619030327: jl 4619034359 ;*ifge

Re: More performance explorations

2011-05-27 Thread John Rose
On May 27, 2011, at 8:04 AM, Charles Oliver Nutter wrote: There's substantially more code here. I don't see any jumps that would short-circuit this logic. Am I reading right? Now the good news is that after the test is completed I don't see much additional logic on MLVM before it's back

Re: More performance explorations

2011-05-26 Thread Charles Oliver Nutter
Ok, here we go with the macosx build from 5/13. Performance is *substantially* better. First tak: user system totalreal 1.401000 0.00 1.401000 ( 0.821000) 0.552000 0.00 0.552000 ( 0.552000) 0.561000 0.00 0.561000 ( 0.561000) 0.552000

Re: More performance explorations

2011-05-26 Thread Charles Oliver Nutter
Now for something completely different: SwitchPoint-based constant lookup in JRuby. It's certainly possible I'm doing something wrong here, but using a SwitchPoint for constant invalidation in JRuby (rather than pinging a global serial number) is significantly slower. Using SwitchPoint:

Re: More performance explorations

2011-05-26 Thread Charles Oliver Nutter
On Thu, May 26, 2011 at 1:34 AM, Charles Oliver Nutter head...@headius.com wrote: Ok, here we go with the macosx build from 5/13. Performance is *substantially* better. It's worth mentioning that the 5/13 build results are only slightly slower than our ideal, JRuby's dynopt mode. That made me

Re: More performance explorations

2011-05-26 Thread Rémi Forax
On 05/27/2011 01:53 AM, Charles Oliver Nutter wrote: On Thu, May 26, 2011 at 5:20 AM, Rémi Foraxfo...@univ-mlv.fr wrote: As far as I know there is no specific optimization of SwitchPoint i.e there is still a volatile read in the middle of the pattern. If that's true I'm not sure how this is

Re: More performance explorations

2011-05-26 Thread John Rose
On May 26, 2011, at 4:53 PM, Charles Oliver Nutter wrote: On Thu, May 26, 2011 at 5:20 AM, Rémi Forax fo...@univ-mlv.fr wrote: As far as I know there is no specific optimization of SwitchPoint i.e there is still a volatile read in the middle of the pattern. If that's true I'm not sure how

Re: More performance explorations

2011-05-26 Thread John Rose
What Remi said. -- John On May 26, 2011, at 5:09 PM, Rémi Forax wrote: the major point of having the SwitchPoint API is, like most of the API of java.lang.invoke, that these API are/will be recognized and optimized by the VM. ___ mlvm-dev mailing

Re: More performance explorations

2011-05-26 Thread John Rose
On May 26, 2011, at 5:18 PM, Charles Oliver Nutter wrote: Some combination of these flags will be enabled by default as they get optimized in Hotspot. That should allow having things on as they're ready to be on by default. That's the way we do it in the JVM also, on works-in-progress which

More performance explorations

2011-05-25 Thread Charles Oliver Nutter
Ok, onward with perf exploration, folks! I'm running with mostly-current MLVM, with John's temporary reversion of GWT to the older non-ricochet logic. As reported before, fib has improved with the reversion, but it's only marginally faster than JRuby's inline caching logic and easily 30-40%