Re: Good news, bad news

2011-05-27 Thread Charles Oliver Nutter
This message was held because it's too big, so I dropboxed the amd64 dumps here:

http://dl.dropbox.com/u/17017035/fib_ruby-asm-64bit.zip

On Fri, May 27, 2011 at 9:05 AM, Charles Oliver Nutter
head...@headius.com wrote:
 Hopefully these 64-bit dumps are better now that Tom managed to get a
 proper 64-bit hsdis built (hooray!).

 On Thu, May 26, 2011 at 6:36 PM, Charles Oliver Nutter
 head...@headius.com wrote:
 On Thu, May 26, 2011 at 6:21 PM, Charles Oliver Nutter
 head...@headius.com wrote:
 No worries...I'll repost 32-bit asm in a moment.

 Attached. This is just fib again. 64-bit is definitely faster than
 32-bit for this benchmark, but the older macosx build is also still
 faster than the current mlvm:

 old macosx:

 9227465
  1.235000   0.00   1.235000 (  1.154000)
 9227465
  1.012000   0.00   1.012000 (  1.012000)
 9227465
  0.969000   0.00   0.969000 (  0.969000)
 9227465
  0.962000   0.00   0.962000 (  0.962000)
 9227465
  0.953000   0.00   0.953000 (  0.953000)

 current mlvm:

 9227465
  1.505000   0.00   1.505000 (  1.406000)
 9227465
  1.175000   0.00   1.175000 (  1.174000)
 9227465
  1.77   0.00   1.77 (  1.771000)
 9227465
  1.255000   0.00   1.255000 (  1.255000)
 9227465
  1.14   0.00   1.14 (  1.141000)

 - Charlie


___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread Tom Rodriguez

On May 25, 2011, at 9:39 PM, Charles Oliver Nutter wrote:

 On Wed, May 25, 2011 at 5:12 PM, Tom Rodriguez tom.rodrig...@oracle.com 
 wrote:
 I've got the bimorphic method handle inline working.  I was doing something 
 stupid at the beginning which is why it wasn't as easy as I expected to be 
 and then I encountered a stupid API which wasted a little more time.  I was 
 able to leverage the existing  invokedynamic call site cache code generation 
 and just generate a different guard.  Ignoring the flag soup below, the 
 first is without invokedynamic, second is with and third is with bimorphic 
 inline for selectAlternative.
 
 Here's a dumb question...couldn't the bimorphic logic have simply been
 generated the second way you indicated, rather than the (x ? a :
 b).whatever form?

That pattern is something the java.lang.invoke machinery is generating as far I 
can tell.  selectAlternative does something like:

(test ? a : b).invokeExact()

and I'm converting it to something like:

c = (test ? a : b);
if (c == a)
   a.invokeExact();
else
   b.invokeExact();

The optimizer turns this into:

if (test)
   a.invokeExact();
else
   b.invokeExact();

but test is some ugly goo because of boxing.  It's relatively easy to get the 
optimizer to fold away the boxing for boolean but sadly it doesn't help the 
performance at all.  Additionally that ends up touching a fair amount of common 
code which makes it a little more risky for 7.

One oddity I've noticed is that what we're ending up with lots of checkcasts to 
IRubyObject but the optimizer doesn't eliminate redundant ones, probably 
because it's a checkcast to an interface.

 
 Great to hear this is in! I'll look for an MLVM patch and until then
 see if I can figure out why the classic GWT logic isn't as fast as
 it was a couple weeks ago. IF I'm feeling saucy I may try to pull MLVM
 + bsd-port from two weeks ago (or find a macosx port build from that
 timeframe) to compare assembly output. Perhaps illustrative for us
 all!

I think that would be informative.

tom

 
 The changes are at http://javaweb.sfbay.sun.com/~never/webrev/bim.  Maybe 
 John can put together an mlvm patch for it.  I'm not quite sure how to do so.
 
 There are still some minor improvements that can be made.  The test part 
 ends up looking like:
 
  if ((test ? Boolean.TRUE : Boolean.FALSE).value())
 
 We need to treat more of those values as constants and split the loads up to 
 fold it away.
 
 I'm keen to have a look at the assembly resulting from the bimorphic
 inlining and see if there's any way to improve it further. My test
 could potentially be improved too.
 
 I'm also pondering ways I can go mostly guard-free for certain core
 JRuby types, possibly reducing the guard to an exact type check.
 There's definitely runway both on your end and my end.
 
 - Charlie
 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread John Rose
On May 26, 2011, at 12:15 AM, Charles Oliver Nutter wrote:

 It is definitely frustrating to see that perf has degraded so much the
 past couple weeks and still not be there with the reverted code.

I second that.

I did some more cherry-picking this evening and the mlvm patch queue is now 
closer to the reality of what's in jdk7 b144.

I also pushed Tom's bimorphic MH inlining code, on an experimental basis.

-- John
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread Christian Thalinger
On May 26, 2011, at 10:42 AM, Christian Thalinger wrote:
 On May 26, 2011, at 9:15 AM, Charles Oliver Nutter wrote:
 On Thu, May 26, 2011 at 1:58 AM, Tom Rodriguez tom.rodrig...@oracle.com 
 wrote:
 but test is some ugly goo because of boxing.  It's relatively easy to get 
 the optimizer to fold away the boxing for boolean but sadly it doesn't help 
 the performance at all.  Additionally that ends up touching a fair amount 
 of common code which makes it a little more risky for 7.
 
 It is definitely frustrating to see that perf has degraded so much the
 past couple weeks and still not be there with the reverted code. I'm
 hoping your fix will help bring the promised ricochet perf to JRuby,
 but so far we're a long way off from what perf looked like on earlier
 builds. There must be something more obvious than boolean box
 folding...perhaps obvious enough we're not seeing it yet.
 
 I also did some performance testing.  I used a recent HS repository, applied 
 Tom's patch, used a recent meth.jar provided by John and this is what I got:

snip

 Here are the numbers from April 28 on the same machine:
 
 http://mail.openjdk.java.net/pipermail/mlvm-dev/2011-April/002873.html
 
 Although I can't remember if the invokedynamic numbers were with or without 
 the guard-removal-hack I tried at that time (I think without, because with 
 the hack the numbers should be very similar to dynopt, see: 
 http://mail.openjdk.java.net/pipermail/mlvm-dev/2011-April/002869.html).
 
 Using Tom's new code doesn't make a difference since I think the meth.jar 
 from John includes the reverted GWT code.

I looked at the inlining tree of fib and everything looks good and is inlined.  
There is one invokeExact which is:

  @ 43   java.lang.invoke.MethodHandle::invokeExact (42 bytes)   too big

but bumping the MaxInlineSize just shows that it's:

  @ 43   java.lang.invoke.MethodHandle::invokeExact (42 bytes)   call site not 
reached

so it doesn't matter anyway.

Is there some additional code that got added to the GWT logic or maybe some 
additional code on the JRuby side that would explain the slowdown?

-- Christian
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread Ola Bini
On 2011-05-26 14.04, John Rose wrote:
 On May 26, 2011, at 12:15 AM, Charles Oliver Nutter wrote:
 
 It is definitely frustrating to see that perf has degraded so much the
 past couple weeks and still not be there with the reverted code.
 
 I second that.
 
 I did some more cherry-picking this evening and the mlvm patch queue is now 
 closer to the reality of what's in jdk7 b144.
 
 I also pushed Tom's bimorphic MH inlining code, on an experimental basis.
 

Hey John,

Something in the patch queue fails to apply cleanly:

+ (cd sources/jdk; hg qpush -a)
(working directory not at a head)
applying meth-ing-7034977.patch
applying meth-conv-6939861.patch
applying meth-invoke-7032850.patch
applying meth-exc-7044892.patch
applying meth-review-7032323.patch
applying meth-deprecate.patch
patching file src/share/classes/java/lang/invoke/MethodHandle.java
Hunk #1 FAILED at 505
1 out of 2 hunks FAILED -- saving rejects to file
src/share/classes/java/lang/invoke/MethodHandle.java.rej
patch failed, unable to continue (try -v)
patch failed, rejects left in working dir
errors during apply, please fix and refresh meth-deprecate.patch
*** Exit status 2.
+ (cd sources/langtools; hg qpush -a)



Cheers
-- 
 Ola Bini (http://olabini.com)
  Ioke - JRuby - ThoughtWorks

 Yields falsehood when quined yields falsehood when quined.

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread John Rose
On May 26, 2011, at 1:42 AM, Christian Thalinger wrote:

 Using Tom's new code doesn't make a difference since I think the meth.jar 
 from John includes the reverted GWT code.

I just added a hook to test whether the GWT change matters or not.  Choose 
either flag setting:
  java -Djava.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=true
  java -Djava.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=false

If false or missing, you get the old reverted logic.

This is the refreshed JAR:
  http://cr.openjdk.java.net/~jrose/pres/indy-javadoc-mlvm/meth.jar

The code is also in the mlvm patch repo.

 A quick check on bench_string_ops.rb shows that a couple of numbers are 
 better but some are worse, especially string == comparison (compare with 
 http://mail.openjdk.java.net/pipermail/mlvm-dev/2011-April/002887.html):

This rings a bell:  I recently had to fix the primitive unboxing logic to make 
this sort of thing work correctly (compatibly with reflection):

  MethodHandle iid = identity(int.class);
  Object o = (p ? (Byte)(byte)1 : (Integer)2);
  int x = (int) iid.asType(methodType(int.class, Object.class).invokeExact(o);

Note that simply casting to Integer and unboxing won't work.  So there's a 
tricky (and potentially slow) test to do in the adaptation of object arguments 
to primitive formal parameters, or from object return types to primitive return 
values.

If this is a problem, your assembly code might have calls to something ugly 
called ValueConversions.primitiveConversion.  Hopefully, it just has fast-path 
code which verifies that a reference is the expected box type, and unboxes.  
There is always a fast path which does this.  But funny wrapper mixing (like 
the Byte above) will cause the slow path to kick in.

-- John___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread John Rose
On May 26, 2011, at 2:12 AM, Ola Bini wrote:

 Hey John,
 
 Something in the patch queue fails to apply cleanly:

That was quick!  I noticed that when I spun the meth.jar. Fixed!  -- John
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread Jochen Theodorou
Am 26.05.2011 11:03, schrieb Christian Thalinger:
[...]
 I looked at the inlining tree of fib and everything looks good and is
 inlined.  There is one invokeExact which is:

 @ 43   java.lang.invoke.MethodHandle::invokeExact (42 bytes)   too
 big

 but bumping the MaxInlineSize just shows that it's:

 @ 43   java.lang.invoke.MethodHandle::invokeExact (42 bytes)   call
 site not reached

 so it doesn't matter anyway.

sorry for gutting in here. Is that looking at the inlining tree 
something normal mortals can also do? I see there some useful 
information, even without invokedynamic. So is this some debugging code 
not found in the normal JVM?

bye Jochen
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread Christian Thalinger
On May 26, 2011, at 11:24 AM, Jochen Theodorou wrote:
 Am 26.05.2011 11:03, schrieb Christian Thalinger:
 [...]
 I looked at the inlining tree of fib and everything looks good and is
 inlined.  There is one invokeExact which is:
 
 @ 43   java.lang.invoke.MethodHandle::invokeExact (42 bytes)   too
 big
 
 but bumping the MaxInlineSize just shows that it's:
 
 @ 43   java.lang.invoke.MethodHandle::invokeExact (42 bytes)   call
 site not reached
 
 so it doesn't matter anyway.
 
 sorry for gutting in here. Is that looking at the inlining tree 
 something normal mortals can also do? I see there some useful 
 information, even without invokedynamic. So is this some debugging code 
 not found in the normal JVM?

Since a couple of HotSpot builds these switches are also available in product 
builds:  PrintCompilation is a product switch, PrintInlining is a diagnostic 
switch.

Do it like:

$ java -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining

-- Christian
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread Charles Oliver Nutter
On Thu, May 26, 2011 at 3:34 AM, John Rose john.r.r...@oracle.com wrote:
 I did some more cherry-picking this evening and the mlvm patch queue is now 
 closer to the reality of what's in jdk7 b144.

 I also pushed Tom's bimorphic MH inlining code, on an experimental basis.

Ok, I'll rebuild and use current stuff for exploration.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread Charles Oliver Nutter
On Thu, May 26, 2011 at 4:03 AM, Christian Thalinger
christian.thalin...@oracle.com wrote:
 Is there some additional code that got added to the GWT logic or maybe some 
 additional code on the JRuby side that would explain the slowdown?

The current logic is 2x faster on the 5/13 build (b141ish maybe?) of
macosx port than on MLVM as of Tuesday. I'll update to latest MLVM and
see what the situation is.

I also can get asm dumps from that 5/13 build, which I'll post
(alongside current asm dumps) in the next 24hrs or so.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread Charles Oliver Nutter
On Thu, May 26, 2011 at 4:47 AM, Charles Oliver Nutter
head...@headius.com wrote:
 The current logic is 2x faster on the 5/13 build (b141ish maybe?) of

By current logic I mean the logic in JRuby master right now. Same
JRuby logic, compared between an earlier macosx build and current mlvm
build, and the macosx build is always faster.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread Charles Oliver Nutter
On Thu, May 26, 2011 at 5:25 AM, Rémi Forax fo...@univ-mlv.fr wrote:
 Charles,
 why do you use IRubyObject exactly ?
 why not using Object instead ?

 I use Object in PHP.reboot and see no problem.

Legacy, mostly... JRuby still supports Java 5 and 6, which limits how
specialized we can make call paths. Requiring that everything be
IRubyObject means we always have only a single invokeinterface to
retrieve metaclass, access instance variables, and so on.

JRuby designed from scratch today would certainly use Object.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread Rémi Forax
On 05/26/2011 12:33 PM, Charles Oliver Nutter wrote:
 On Thu, May 26, 2011 at 5:25 AM, Rémi Foraxfo...@univ-mlv.fr  wrote:
 Charles,
 why do you use IRubyObject exactly ?
 why not using Object instead ?

 I use Object in PHP.reboot and see no problem.
 Legacy, mostly... JRuby still supports Java 5 and 6, which limits how
 specialized we can make call paths. Requiring that everything be
 IRubyObject means we always have only a single invokeinterface to
 retrieve metaclass, access instance variables, and so on.

 JRuby designed from scratch today would certainly use Object.

For the metaclass, the idea is to not retrieve the metaclass in the fast 
path,
if you have one Java class for one ruby class and a 1 to 1 relation between
a class and a metaclass, checking the class and the SwitchPoint of the 
metaclass
is sufficient (because you can bind the switch point).

 - Charlie

Rémi

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread Charles Oliver Nutter
On Thu, May 26, 2011 at 6:18 AM, Rémi Forax fo...@univ-mlv.fr wrote:
 For the metaclass, the idea is to not retrieve the metaclass in the fast
 path,
 if you have one Java class for one ruby class and a 1 to 1 relation between
 a class and a metaclass, checking the class and the SwitchPoint of the
 metaclass
 is sufficient (because you can bind the switch point).

Right, this would work if all classes defined in Ruby were 1:1 with a
class in Java. This is not the case in JRuby; user-defined classes are
rarely backed by a new JVM class, and usually just use RubyObject as
their JVM representation.

I will likely use this strategy for core JRuby classes, however, by
making sure that RubyObject, RubyString, etc are not used as the
backing store for user-defined classes, and instead having
RubyInheritedObject, RubyInheritedString (or something) for that
purpose.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread Tom Rodriguez
There's something deeply bizarro about that assembly.  It looks like x64 
assembly decoded by the 32 assembler.  I think your hsdis wasn't built right.

tom

On May 26, 2011, at 2:13 PM, Charles Oliver Nutter wrote:

 On Thu, May 26, 2011 at 1:58 AM, Tom Rodriguez tom.rodrig...@oracle.com 
 wrote:
 Great to hear this is in! I'll look for an MLVM patch and until then
 see if I can figure out why the classic GWT logic isn't as fast as
 it was a couple weeks ago. IF I'm feeling saucy I may try to pull MLVM
 + bsd-port from two weeks ago (or find a macosx port build from that
 timeframe) to compare assembly output. Perhaps illustrative for us
 all!
 
 I think that would be informative.
 
 Ok, I've attached two assembly dumps, one from mostly-current MLVM
 (build early this morning) and one from the macosx port from 5/13.
 This is just fib_ruby. I haven't started to explore the differences
 yet.
 
 - Charlie
 asm_logs.zip___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread John Rose
On May 26, 2011, at 2:43 PM, Charles Oliver Nutter wrote:

 Hmm, I'll look into it. I used the amd64 build from the hsdis project

I posted the bsd/amd64 libdis when I was desperate for something that would at 
least link and partially decode instructions.  But that hsdis binary doesn't 
really know about x64 disassembly.  (The base project doesn't support x64.  The 
gnu binutils disassembler is needed.)  The download should have a warning 
label; sorry.

-- John


___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread Charles Oliver Nutter
I see someone tagged that build as broken now on Kenai. Can someone
with the appropriate skills do a non-broken build? I've never build
the hsdis plugin (but I'm attempting to now.

- Charlie

On Thu, May 26, 2011 at 4:43 PM, Charles Oliver Nutter
head...@headius.com wrote:
 Hmm, I'll look into it. I used the amd64 build from the hsdis project
 on kenai...

 On Thu, May 26, 2011 at 4:27 PM, Tom Rodriguez tom.rodrig...@oracle.com 
 wrote:
 There's something deeply bizarro about that assembly.  It looks like x64 
 assembly decoded by the 32 assembler.  I think your hsdis wasn't built right.

 tom

 On May 26, 2011, at 2:13 PM, Charles Oliver Nutter wrote:

 On Thu, May 26, 2011 at 1:58 AM, Tom Rodriguez tom.rodrig...@oracle.com 
 wrote:
 Great to hear this is in! I'll look for an MLVM patch and until then
 see if I can figure out why the classic GWT logic isn't as fast as
 it was a couple weeks ago. IF I'm feeling saucy I may try to pull MLVM
 + bsd-port from two weeks ago (or find a macosx port build from that
 timeframe) to compare assembly output. Perhaps illustrative for us
 all!

 I think that would be informative.

 Ok, I've attached two assembly dumps, one from mostly-current MLVM
 (build early this morning) and one from the macosx port from 5/13.
 This is just fib_ruby. I haven't started to explore the differences
 yet.

 - Charlie
 asm_logs.zip___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread John Rose
On May 26, 2011, at 3:45 PM, Charles Oliver Nutter wrote:

 Ahh...well is there another binary you can point me at? Otherwise,
 I'll just do i386 asm for now.

I tagged it.  It's broken in the sense that it's a brutal 64-bit build of code 
designed only for 32-bit disassembly.

I suggest doing at least some of these investigations with 32-bit builds.  
(But, we do need a binutils-based hsdis!)

-- John
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread Tom Rodriguez
I'll see what I can do on my mac.

tom

On May 26, 2011, at 3:45 PM, Charles Oliver Nutter wrote:

 On Thu, May 26, 2011 at 5:16 PM, John Rose john.r.r...@oracle.com wrote:
 I posted the bsd/amd64 libdis when I was desperate for something that would 
 at least link and partially decode instructions.  But that hsdis binary 
 doesn't really know about x64 disassembly.  (The base project doesn't 
 support x64.  The gnu binutils disassembler is needed.)  The download should 
 have a warning label; sorry.
 
 Ahh...well is there another binary you can point me at? Otherwise,
 I'll just do i386 asm for now.
 
 - Charlie
 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-26 Thread Charles Oliver Nutter
Oh, forgot to include dynopt numbers under that older macosx build:

~/projects/jruby ➔ jruby -Xcompile.dynopt=true
-Xinvokedynamic.invocation=false --server -J-d32
bench/bench_fib_recursive.rb 5 35
9227465
  1.366000   0.00   1.366000 (  1.293000)
9227465
  0.949000   0.00   0.949000 (  0.948000)
9227465
  0.965000   0.00   0.965000 (  0.965000)
9227465
  0.932000   0.00   0.932000 (  0.932000)
9227465
  0.924000   0.00   0.924000 (  0.924000)

As mentioned before, the old build was just a hair slower than dynopt.

- Charlie

On Thu, May 26, 2011 at 6:36 PM, Charles Oliver Nutter
head...@headius.com wrote:
 On Thu, May 26, 2011 at 6:21 PM, Charles Oliver Nutter
 head...@headius.com wrote:
 No worries...I'll repost 32-bit asm in a moment.

 Attached. This is just fib again. 64-bit is definitely faster than
 32-bit for this benchmark, but the older macosx build is also still
 faster than the current mlvm:

 old macosx:

 9227465
  1.235000   0.00   1.235000 (  1.154000)
 9227465
  1.012000   0.00   1.012000 (  1.012000)
 9227465
  0.969000   0.00   0.969000 (  0.969000)
 9227465
  0.962000   0.00   0.962000 (  0.962000)
 9227465
  0.953000   0.00   0.953000 (  0.953000)

 current mlvm:

 9227465
  1.505000   0.00   1.505000 (  1.406000)
 9227465
  1.175000   0.00   1.175000 (  1.174000)
 9227465
  1.77   0.00   1.77 (  1.771000)
 9227465
  1.255000   0.00   1.255000 (  1.255000)
 9227465
  1.14   0.00   1.14 (  1.141000)

 - Charlie

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-25 Thread Christian Thalinger
On May 25, 2011, at 5:58 AM, Ola Bini wrote:
 Hi,
 
 There are at least three problems that are still there. They might be
 connected, or not.
 (I will tell you how to reproduce these at the end)
 
 I just built a new JVM:
 openjdk version 1.7.0-internal
 OpenJDK Runtime Environment (build
 1.7.0-internal-olabini_2011_05_25_08_06-b00)
 OpenJDK Server VM (build 21.0-b09, mixed mode)
 
 I get that weird missing class error when running my test suite:
 java.lang.NoClassDefFoundError: seph/lang/SephObject
 java.lang.RuntimeException: java.lang.NoClassDefFoundError:
 seph/lang/SephObject
   at seph.lang.Runtime.evaluateStream(Runtime.java:132)
   at seph.lang.Runtime.evaluateString(Runtime.java:146)
   at
 seph.lang.code.BasicSanityTest.recursive_odd_and_even_that_should_blow_the_stack(BasicSanityTest.java:214)
 Caused by: java.lang.NoClassDefFoundError: seph/lang/SephObject
   at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java)
   at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java)
   at seph$gen$abstraction$41$even?.argument_0_0(eval:7)
   at seph.lang.compiler.Bootstrap.intrinsic_if(Bootstrap.java:654)
   at seph$gen$abstraction$41$even?.even?(eval:7)
   at seph$gen$abstraction$39$toplevel.toplevel(eval:25)
   at seph.lang.Runtime.evaluateStream(Runtime.java:125)
 
 This is obviously a complete blocker.
 
 Turning off the ricochet frames hits an NYI error when running the test
 suite.
 
 I'm still seeing a crash in ricochet frames:
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGBUS (0xa) at pc=0x0104884f, pid=85043, tid=2953850880
 #
 # JRE version: 7.0
 # Java VM: OpenJDK Client VM (21.0-b09 mixed mode bsd-x86 )
 # Problematic frame:
 # V  [libjvm.dylib+0x4884f]  MethodHandles::ricochet_frame_oops_do(frame
 const, OopClosure*, RegisterMap const*)+0x12f
 #
 # Failed to write core dump. Core dumps have been disabled. To enable
 core dumping, try ulimit -c unlimited before starting Java again
 #
 # An error report file with more information is saved as:
 # /Users/olabini/workspace/seph/hs_err_pid85043.log
 #
 # If you would like to submit a bug report, please visit:
 #   http://bugreport.sun.com/bugreport/crash.jsp
 #
 
 I've attached the error file.
 
 The third error (in frame.cpp) only happens on master of seph, but there
 are some other problems with master that means you get lots of weird output.
 
 In order to reproduce problem 1 and 2:
 git clone git://github.com/seph-lang/seph.git
 git checkout 9075c0f4ffe1adac0657057aee2193f16ad12a43
 (build.xml: remove lines with jvmarg value=-Xint/)
 
 problem 1:
 ant
   This should show you one entry of the missing class problem

I tried this on solaris-i586 and it works:

BUILD SUCCESSFUL
Total time: 29 seconds

 
 problem 2:
 ant jar-notest
 bin/seph bench/bench_arithmetic.sp
   This should show you problem 2.
   If you for reference want to see the benchmark run correctly, do
 bin/seph -J-Xint bench/bench_arithmetic.sp

Works too (with server compiler on b143).

 
 problem 3:
 git checkout master
 git checkout 99c8f2609d468835390e39b68c73f21cc78e5ab5
 ant clean jar-notest
 bin/seph bench/bench_arithmetic.sp
   This should show you problem 3. You will also see loads of other
 exceptions, since that point generates slightly bad bytecode. That
 shouldn't make the JVM crash though, I assume - and I've seen the
 frame.cpp should not reach here without seeing those exceptions.

I think I see the exception you are talking about but there is so much output 
that I'm not sure.

I curious about where the problem lies, either it's a HotSpot fix that's not in 
the mlvm repository yet or it's a JDK bug in one of the mlvm patches which 
hasn't been pushed yet into the main repository.  At least that's my best guess.

May I suggest that you guys also try with an official build (like JDK 7 b143) 
from here (assuming everyone has some kind of access to a Linux box too):

http://jdk7.java.net/download.html

-- Christian

 
 All of these require that JAVA_HOME points to the Java 7 you want to use
 
 Cheers
 
 
 tom
 
 On May 23, 2011, at 7:33 PM, Ola Bini wrote:
 
 Hi,
 
 I'm happy to see that the performance degradation is getting
 addressed. I would like to point out that there is still a serious
 crash in the machinery too... Have you seen any reason why that
 happens?
 
 Cheers
 
 
 On 2011-05-24 06.11, John Rose wrote:
 On May 23, 2011, at 3:44 PM, Tom Rodriguez wrote:
 
 I'd *love* for intermediate static Java snippits like this
 to inline straight through, even if invokeExact would be
 megamorphic under normal circumstances...but I don't think
 that's the case right now, right?
 
 I haven't been following 292 that closely but it used to be a
 piece of code that did precisely that.
 
 +private Object invoke_L0() throws Throwable { + if
 ((boolean) test.invokeExact()) +return 
 target.invokeExact(); +return
 fallback.invokeExact(); + }
 
 It 

Re: Good news, bad news

2011-05-25 Thread Ola Bini
Hi,

Just to clarify, my builds are against the current patchset in the MLVM
repository, so that might explain why you're not seeing these problems.

Cheers

On 2011-05-25 14.29, Christian Thalinger wrote:
 On May 25, 2011, at 5:58 AM, Ola Bini wrote:
 Hi,

 There are at least three problems that are still there. They might be
 connected, or not.
 (I will tell you how to reproduce these at the end)

 I just built a new JVM:
 openjdk version 1.7.0-internal
 OpenJDK Runtime Environment (build
 1.7.0-internal-olabini_2011_05_25_08_06-b00)
 OpenJDK Server VM (build 21.0-b09, mixed mode)

 I get that weird missing class error when running my test suite:
 java.lang.NoClassDefFoundError: seph/lang/SephObject
 java.lang.RuntimeException: java.lang.NoClassDefFoundError:
 seph/lang/SephObject
  at seph.lang.Runtime.evaluateStream(Runtime.java:132)
  at seph.lang.Runtime.evaluateString(Runtime.java:146)
  at
 seph.lang.code.BasicSanityTest.recursive_odd_and_even_that_should_blow_the_stack(BasicSanityTest.java:214)
 Caused by: java.lang.NoClassDefFoundError: seph/lang/SephObject
  at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java)
  at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java)
  at seph$gen$abstraction$41$even?.argument_0_0(eval:7)
  at seph.lang.compiler.Bootstrap.intrinsic_if(Bootstrap.java:654)
  at seph$gen$abstraction$41$even?.even?(eval:7)
  at seph$gen$abstraction$39$toplevel.toplevel(eval:25)
  at seph.lang.Runtime.evaluateStream(Runtime.java:125)

 This is obviously a complete blocker.

 Turning off the ricochet frames hits an NYI error when running the test
 suite.

 I'm still seeing a crash in ricochet frames:
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGBUS (0xa) at pc=0x0104884f, pid=85043, tid=2953850880
 #
 # JRE version: 7.0
 # Java VM: OpenJDK Client VM (21.0-b09 mixed mode bsd-x86 )
 # Problematic frame:
 # V  [libjvm.dylib+0x4884f]  MethodHandles::ricochet_frame_oops_do(frame
 const, OopClosure*, RegisterMap const*)+0x12f
 #
 # Failed to write core dump. Core dumps have been disabled. To enable
 core dumping, try ulimit -c unlimited before starting Java again
 #
 # An error report file with more information is saved as:
 # /Users/olabini/workspace/seph/hs_err_pid85043.log
 #
 # If you would like to submit a bug report, please visit:
 #   http://bugreport.sun.com/bugreport/crash.jsp
 #

 I've attached the error file.

 The third error (in frame.cpp) only happens on master of seph, but there
 are some other problems with master that means you get lots of weird output.

 In order to reproduce problem 1 and 2:
 git clone git://github.com/seph-lang/seph.git
 git checkout 9075c0f4ffe1adac0657057aee2193f16ad12a43
 (build.xml: remove lines with jvmarg value=-Xint/)

 problem 1:
 ant
   This should show you one entry of the missing class problem
 
 I tried this on solaris-i586 and it works:
 
 BUILD SUCCESSFUL
 Total time: 29 seconds
 

 problem 2:
 ant jar-notest
 bin/seph bench/bench_arithmetic.sp
   This should show you problem 2.
   If you for reference want to see the benchmark run correctly, do
 bin/seph -J-Xint bench/bench_arithmetic.sp
 
 Works too (with server compiler on b143).
 

 problem 3:
 git checkout master
 git checkout 99c8f2609d468835390e39b68c73f21cc78e5ab5
 ant clean jar-notest
 bin/seph bench/bench_arithmetic.sp
   This should show you problem 3. You will also see loads of other
 exceptions, since that point generates slightly bad bytecode. That
 shouldn't make the JVM crash though, I assume - and I've seen the
 frame.cpp should not reach here without seeing those exceptions.
 
 I think I see the exception you are talking about but there is so much output 
 that I'm not sure.
 
 I curious about where the problem lies, either it's a HotSpot fix that's not 
 in the mlvm repository yet or it's a JDK bug in one of the mlvm patches which 
 hasn't been pushed yet into the main repository.  At least that's my best 
 guess.
 
 May I suggest that you guys also try with an official build (like JDK 7 b143) 
 from here (assuming everyone has some kind of access to a Linux box too):
 
 http://jdk7.java.net/download.html
 
 -- Christian
 

 All of these require that JAVA_HOME points to the Java 7 you want to use

 Cheers


 tom

 On May 23, 2011, at 7:33 PM, Ola Bini wrote:

 Hi,

 I'm happy to see that the performance degradation is getting
 addressed. I would like to point out that there is still a serious
 crash in the machinery too... Have you seen any reason why that
 happens?

 Cheers


 On 2011-05-24 06.11, John Rose wrote:
 On May 23, 2011, at 3:44 PM, Tom Rodriguez wrote:

 I'd *love* for intermediate static Java snippits like this
 to inline straight through, even if invokeExact would be
 megamorphic under normal circumstances...but I don't think
 that's the case right now, right?

 I haven't been following 292 that closely but it used to be a
 piece of code that did precisely that.


Re: Good news, bad news

2011-05-25 Thread Christian Thalinger
On May 25, 2011, at 12:20 PM, Ola Bini wrote:
 Hi,
 
 Just to clarify, my builds are against the current patchset in the MLVM
 repository, so that might explain why you're not seeing these problems.

I know that.  That's why we need to find out where the problem is (some hints 
below).  Can someone provide a jar file of the current MLVM JSR 292 classes 
(like John's meth.jar)?

-- Christian

 
 Cheers
 
 On 2011-05-25 14.29, Christian Thalinger wrote:
 On May 25, 2011, at 5:58 AM, Ola Bini wrote:
 Hi,
 
 There are at least three problems that are still there. They might be
 connected, or not.
 (I will tell you how to reproduce these at the end)
 
 I just built a new JVM:
 openjdk version 1.7.0-internal
 OpenJDK Runtime Environment (build
 1.7.0-internal-olabini_2011_05_25_08_06-b00)
 OpenJDK Server VM (build 21.0-b09, mixed mode)
 
 I get that weird missing class error when running my test suite:
 java.lang.NoClassDefFoundError: seph/lang/SephObject
 java.lang.RuntimeException: java.lang.NoClassDefFoundError:
 seph/lang/SephObject
 at seph.lang.Runtime.evaluateStream(Runtime.java:132)
 at seph.lang.Runtime.evaluateString(Runtime.java:146)
 at
 seph.lang.code.BasicSanityTest.recursive_odd_and_even_that_should_blow_the_stack(BasicSanityTest.java:214)
 Caused by: java.lang.NoClassDefFoundError: seph/lang/SephObject
 at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java)
 at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java)
 at seph$gen$abstraction$41$even?.argument_0_0(eval:7)
 at seph.lang.compiler.Bootstrap.intrinsic_if(Bootstrap.java:654)
 at seph$gen$abstraction$41$even?.even?(eval:7)
 at seph$gen$abstraction$39$toplevel.toplevel(eval:25)
 at seph.lang.Runtime.evaluateStream(Runtime.java:125)
 
 This is obviously a complete blocker.
 
 Turning off the ricochet frames hits an NYI error when running the test
 suite.
 
 I'm still seeing a crash in ricochet frames:
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGBUS (0xa) at pc=0x0104884f, pid=85043, tid=2953850880
 #
 # JRE version: 7.0
 # Java VM: OpenJDK Client VM (21.0-b09 mixed mode bsd-x86 )
 # Problematic frame:
 # V  [libjvm.dylib+0x4884f]  MethodHandles::ricochet_frame_oops_do(frame
 const, OopClosure*, RegisterMap const*)+0x12f
 #
 # Failed to write core dump. Core dumps have been disabled. To enable
 core dumping, try ulimit -c unlimited before starting Java again
 #
 # An error report file with more information is saved as:
 # /Users/olabini/workspace/seph/hs_err_pid85043.log
 #
 # If you would like to submit a bug report, please visit:
 #   http://bugreport.sun.com/bugreport/crash.jsp
 #
 
 I've attached the error file.
 
 The third error (in frame.cpp) only happens on master of seph, but there
 are some other problems with master that means you get lots of weird output.
 
 In order to reproduce problem 1 and 2:
 git clone git://github.com/seph-lang/seph.git
 git checkout 9075c0f4ffe1adac0657057aee2193f16ad12a43
 (build.xml: remove lines with jvmarg value=-Xint/)
 
 problem 1:
 ant
  This should show you one entry of the missing class problem
 
 I tried this on solaris-i586 and it works:
 
 BUILD SUCCESSFUL
 Total time: 29 seconds
 
 
 problem 2:
 ant jar-notest
 bin/seph bench/bench_arithmetic.sp
  This should show you problem 2.
  If you for reference want to see the benchmark run correctly, do
 bin/seph -J-Xint bench/bench_arithmetic.sp
 
 Works too (with server compiler on b143).
 
 
 problem 3:
 git checkout master
 git checkout 99c8f2609d468835390e39b68c73f21cc78e5ab5
 ant clean jar-notest
 bin/seph bench/bench_arithmetic.sp
  This should show you problem 3. You will also see loads of other
 exceptions, since that point generates slightly bad bytecode. That
 shouldn't make the JVM crash though, I assume - and I've seen the
 frame.cpp should not reach here without seeing those exceptions.
 
 I think I see the exception you are talking about but there is so much 
 output that I'm not sure.
 
 I curious about where the problem lies, either it's a HotSpot fix that's not 
 in the mlvm repository yet or it's a JDK bug in one of the mlvm patches 
 which hasn't been pushed yet into the main repository.  At least that's my 
 best guess.
 
 May I suggest that you guys also try with an official build (like JDK 7 
 b143) from here (assuming everyone has some kind of access to a Linux box 
 too):
 
 http://jdk7.java.net/download.html
 
 -- Christian
 
 
 All of these require that JAVA_HOME points to the Java 7 you want to use
 
 Cheers
 
 
 tom
 
 On May 23, 2011, at 7:33 PM, Ola Bini wrote:
 
 Hi,
 
 I'm happy to see that the performance degradation is getting
 addressed. I would like to point out that there is still a serious
 crash in the machinery too... Have you seen any reason why that
 happens?
 
 Cheers
 
 
 On 2011-05-24 06.11, John Rose wrote:
 On May 23, 2011, at 3:44 PM, Tom Rodriguez wrote:
 
 I'd *love* for intermediate static Java snippits like 

Re: Good news, bad news

2011-05-25 Thread Ola Bini
Well, on my linux box with a JDK built this morning I don't see any of
these problems actually. (However, there are things on my master that
only works with the mlvm patches. Type conversions specifically.)

Problem 3 also seems to not be there - you would notice if it were since
it's also a total JVM crash. I do not see it on Linux with this version:
openjdk version 1.7.0-internal
OpenJDK Runtime Environment (build
1.7.0-internal-obini2_2011_05_25_03_33-b00)
OpenJDK 64-Bit Server VM (build 21.0-b13, mixed mode)

I built it from http://hg.openjdk.java.net/jdk7/jdk7 so it's not an
official drop, but it's not bsdport with mlvm either.

Cheers

On 2011-05-25 14.29, Christian Thalinger wrote:
 On May 25, 2011, at 5:58 AM, Ola Bini wrote:
 Hi,

 There are at least three problems that are still there. They might be
 connected, or not.
 (I will tell you how to reproduce these at the end)

 I just built a new JVM:
 openjdk version 1.7.0-internal
 OpenJDK Runtime Environment (build
 1.7.0-internal-olabini_2011_05_25_08_06-b00)
 OpenJDK Server VM (build 21.0-b09, mixed mode)

 I get that weird missing class error when running my test suite:
 java.lang.NoClassDefFoundError: seph/lang/SephObject
 java.lang.RuntimeException: java.lang.NoClassDefFoundError:
 seph/lang/SephObject
  at seph.lang.Runtime.evaluateStream(Runtime.java:132)
  at seph.lang.Runtime.evaluateString(Runtime.java:146)
  at
 seph.lang.code.BasicSanityTest.recursive_odd_and_even_that_should_blow_the_stack(BasicSanityTest.java:214)
 Caused by: java.lang.NoClassDefFoundError: seph/lang/SephObject
  at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java)
  at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java)
  at seph$gen$abstraction$41$even?.argument_0_0(eval:7)
  at seph.lang.compiler.Bootstrap.intrinsic_if(Bootstrap.java:654)
  at seph$gen$abstraction$41$even?.even?(eval:7)
  at seph$gen$abstraction$39$toplevel.toplevel(eval:25)
  at seph.lang.Runtime.evaluateStream(Runtime.java:125)

 This is obviously a complete blocker.

 Turning off the ricochet frames hits an NYI error when running the test
 suite.

 I'm still seeing a crash in ricochet frames:
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGBUS (0xa) at pc=0x0104884f, pid=85043, tid=2953850880
 #
 # JRE version: 7.0
 # Java VM: OpenJDK Client VM (21.0-b09 mixed mode bsd-x86 )
 # Problematic frame:
 # V  [libjvm.dylib+0x4884f]  MethodHandles::ricochet_frame_oops_do(frame
 const, OopClosure*, RegisterMap const*)+0x12f
 #
 # Failed to write core dump. Core dumps have been disabled. To enable
 core dumping, try ulimit -c unlimited before starting Java again
 #
 # An error report file with more information is saved as:
 # /Users/olabini/workspace/seph/hs_err_pid85043.log
 #
 # If you would like to submit a bug report, please visit:
 #   http://bugreport.sun.com/bugreport/crash.jsp
 #

 I've attached the error file.

 The third error (in frame.cpp) only happens on master of seph, but there
 are some other problems with master that means you get lots of weird output.

 In order to reproduce problem 1 and 2:
 git clone git://github.com/seph-lang/seph.git
 git checkout 9075c0f4ffe1adac0657057aee2193f16ad12a43
 (build.xml: remove lines with jvmarg value=-Xint/)

 problem 1:
 ant
   This should show you one entry of the missing class problem
 
 I tried this on solaris-i586 and it works:
 
 BUILD SUCCESSFUL
 Total time: 29 seconds
 

 problem 2:
 ant jar-notest
 bin/seph bench/bench_arithmetic.sp
   This should show you problem 2.
   If you for reference want to see the benchmark run correctly, do
 bin/seph -J-Xint bench/bench_arithmetic.sp
 
 Works too (with server compiler on b143).
 

 problem 3:
 git checkout master
 git checkout 99c8f2609d468835390e39b68c73f21cc78e5ab5
 ant clean jar-notest
 bin/seph bench/bench_arithmetic.sp
   This should show you problem 3. You will also see loads of other
 exceptions, since that point generates slightly bad bytecode. That
 shouldn't make the JVM crash though, I assume - and I've seen the
 frame.cpp should not reach here without seeing those exceptions.
 
 I think I see the exception you are talking about but there is so much output 
 that I'm not sure.
 
 I curious about where the problem lies, either it's a HotSpot fix that's not 
 in the mlvm repository yet or it's a JDK bug in one of the mlvm patches which 
 hasn't been pushed yet into the main repository.  At least that's my best 
 guess.
 
 May I suggest that you guys also try with an official build (like JDK 7 b143) 
 from here (assuming everyone has some kind of access to a Linux box too):
 
 http://jdk7.java.net/download.html
 
 -- Christian
 

 All of these require that JAVA_HOME points to the Java 7 you want to use

 Cheers


 tom

 On May 23, 2011, at 7:33 PM, Ola Bini wrote:

 Hi,

 I'm happy to see that the performance degradation is getting
 addressed. I would like to point out that there is still a serious
 

Re: Good news, bad news

2011-05-25 Thread John Rose
On May 25, 2011, at 3:23 AM, Christian Thalinger wrote:

 I know that.  That's why we need to find out where the problem is (some hints 
 below).  Can someone provide a jar file of the current MLVM JSR 292 classes 
 (like John's meth.jar)?

I just posted it with the javadoc:
  http://cr.openjdk.java.net/~jrose/pres/indy-javadoc-mlvm/
  http://cr.openjdk.java.net/~jrose/pres/indy-javadoc-mlvm/meth.jar
  http://cr.openjdk.java.net/~jrose/pres/indy-javadoc-mlvm/test-classes.jar

Also, here's my setup for running the test classes:

$ hg clone http://hg.openjdk.java.net/mlvm/mlvm /tmp/mlvm
$ hg clone ssh://hg.kenai.com/ninja~indify-repo /tmp/indify
$ ant test -Dproject.indify=/tmp/indify -Dplatforms.JDK_1.7.home=$JAVA7X_BUILD 
-Dsrc.special-config-test-src.dir=''

(It's a twisty path.  Your mileage will vary.)

-- John
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-25 Thread Charles Oliver Nutter
Actually now that I look at it, this isn't all that solid performance.
It's just *slightly* faster than using JRuby's inline cache, which
almost never inlines...

Inline cache only, via CachingCallSite.

~/projects/jruby ➔ jruby -Xcompile.invokedynamic=false --server
bench/bench_fib_recursive.rb 5 359227465
  1.23   0.00   1.23 (  1.114000)
9227465
  1.024000   0.00   1.024000 (  1.024000)
9227465
  1.016000   0.00   1.016000 (  1.016000)
9227465
  0.993000   0.00   0.993000 (  0.993000)
9227465
  1.003000   0.00   1.003000 (  1.003000)

So invokedynamic dispatch is better, but only by a tiny margin. :(

I'll start browsing logs and asm dumps later today to see if I can
figure out why it's not as fast as before.

- Charlie

On Wed, May 25, 2011 at 12:00 PM, Charles Oliver Nutter
head...@headius.com wrote:
 On Wed, May 25, 2011 at 11:31 AM, Charles Oliver Nutter
 head...@headius.com wrote:
 Build in progress! I'll let you know how it goes. How low is low in low 
 arity?

 Ok, we're definitely back to having solid performance, but it's not as
 good as it was before. I'm not sure if it's related to the inlining
 thresholds, since bumping them up the same way we used to doesn't get
 all the perf back. Here's the numbers I have. It's now around 3x
 faster than it was a few days ago, but not as fast as JRuby's dynopt
 mode (which inserts a guarded invokevirtual next to a normal IC hit).
 The invokedynamic logic was nearly as fast as dynopt before, which was
 breathtaking to behold.

 Indy, indy with thresholds bumped, dynopt:

 ~/projects/jruby ➔ jruby --server bench/bench_fib_recursive.rb 5 359227465
  1.215000   0.00   1.215000 (  1.155000)
 9227465
  0.985000   0.00   0.985000 (  0.985000)
 9227465
  0.977000   0.00   0.977000 (  0.977000)
 9227465
  0.981000   0.00   0.981000 (  0.981000)
 9227465
  0.976000   0.00   0.976000 (  0.976000)

 ~/projects/jruby ➔ jruby -J-XX:MaxInlineSize=150
 -J-XX:InlineSmallCode=3000 --server bench/bench_fib_recursive.rb 5 35
 9227465
  1.209000   0.00   1.209000 (  1.132000)
 9227465
  0.981000   0.00   0.981000 (  0.981000)
 9227465
  0.974000   0.00   0.974000 (  0.974000)
 9227465
  0.971000   0.00   0.971000 (  0.971000)
 9227465
  0.969000   0.00   0.969000 (  0.969000)

 ~/projects/jruby ➔ jruby -Xcompile.dynopt=true
 -Xcompile.invokedynamic=false --server bench/bench_fib_recursive.rb 5
 35
 9227465
  1.078000   0.00   1.078000 (  1.013000)
 9227465
  0.706000   0.00   0.706000 (  0.706000)
 9227465
  0.697000   0.00   0.697000 (  0.697000)
 9227465
  0.699000   0.00   0.699000 (  0.699000)
 9227465
  0.695000   0.00   0.695000 (  0.695000)

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-25 Thread John Rose
On May 25, 2011, at 9:31 AM, Charles Oliver Nutter wrote:

 Build in progress! I'll let you know how it goes. How low is low in low 
 arity?

About 0-9.

Let's find the next bottleneck!

-- John

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-25 Thread Charles Oliver Nutter
On Wed, May 25, 2011 at 2:14 PM, John Rose john.r.r...@oracle.com wrote:
 On May 25, 2011, at 9:31 AM, Charles Oliver Nutter wrote:

 Build in progress! I'll let you know how it goes. How low is low in low 
 arity?

 About 0-9.

 Let's find the next bottleneck!

That's the spirit! We can do it!

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-25 Thread Tom Rodriguez
I've got the bimorphic method handle inline working.  I was doing something 
stupid at the beginning which is why it wasn't as easy as I expected to be and 
then I encountered a stupid API which wasted a little more time.  I was able to 
leverage the existing  invokedynamic call site cache code generation and just 
generate a different guard.  Ignoring the flag soup below, the first is without 
invokedynamic, second is with and third is with bimorphic inline for 
selectAlternative.

( setenv _JAVA_OPTIONS -XX:-PrintCompilation 
-Xbootclasspath/p:/export/ws/twisti/meth.jar -XX:+UseNewCode -Xbatch 
-XX:-UseNewCode2 -XX:-LogCompilation -XX:-PrintInlining ; setenv JAVA_HOME 
/export/ws/indy/i586/jdk1.7.0 ; ./bin/jruby --server 
-Xcompile.invokedynamic=false bench/bench_fib_recursive.rb 5 35 )   

  
  2.006000   0.00   2.006000 (  1.856000)
  1.391000   0.00   1.391000 (  1.391000)
  1.386000   0.00   1.386000 (  1.386000)
  1.386000   0.00   1.386000 (  1.386000)
  1.385000   0.00   1.385000 (  1.385000)

( setenv _JAVA_OPTIONS -XX:-PrintCompilation 
-Xbootclasspath/p:/export/ws/twisti/meth.jar -XX:+UseNewCode -Xbatch 
-XX:-UseNewCode2 -XX:-LogCompilation -XX:-PrintInlining ; setenv JAVA_HOME 
/export/ws/indy/i586/jdk1.7.0 ; ./bin/jruby --server 
-Xcompile.invokedynamic=true bench/bench_fib_recursive.rb 5 35 )
  4.235000   0.00   4.235000 (  4.086000)
  3.689000   0.00   3.689000 (  3.689000)
  3.687000   0.00   3.687000 (  3.688000)
  3.68   0.00   3.68 (  3.68)
  3.68   0.00   3.68 (  3.68)

( setenv _JAVA_OPTIONS -XX:-PrintCompilation 
-Xbootclasspath/p:/export/ws/twisti/meth.jar -XX:+UseNewCode -Xbatch 
-XX:+UseNewCode2 -XX:-LogCompilation -XX:-PrintInlining ; setenv JAVA_HOME 
/export/ws/indy/i586/jdk1.7.0 ; ./bin/jruby --server 
-Xcompile.invokedynamic=true bench/bench_fib_recursive.rb 5 35 )Picked up 
  2.103000   0.00   2.103000 (  1.95)
  1.281000   0.00   1.281000 (  1.281000)
  1.275000   0.00   1.275000 (  1.274000)
  1.279000   0.00   1.279000 (  1.279000)
  1.288000   0.00   1.288000 (  1.288000)

The changes are at http://javaweb.sfbay.sun.com/~never/webrev/bim.  Maybe John 
can put together an mlvm patch for it.  I'm not quite sure how to do so.

There are still some minor improvements that can be made.  The test part ends 
up looking like:

  if ((test ? Boolean.TRUE : Boolean.FALSE).value())

We need to treat more of those values as constants and split the loads up to 
fold it away.

tom

On May 25, 2011, at 12:14 PM, John Rose wrote:

 On May 25, 2011, at 9:31 AM, Charles Oliver Nutter wrote:
 
 Build in progress! I'll let you know how it goes. How low is low in low 
 arity?
 
 About 0-9.
 
 Let's find the next bottleneck!
 
 -- John
 
 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-25 Thread Charles Oliver Nutter
On Wed, May 25, 2011 at 5:12 PM, Tom Rodriguez tom.rodrig...@oracle.com wrote:
 I've got the bimorphic method handle inline working.  I was doing something 
 stupid at the beginning which is why it wasn't as easy as I expected to be 
 and then I encountered a stupid API which wasted a little more time.  I was 
 able to leverage the existing  invokedynamic call site cache code generation 
 and just generate a different guard.  Ignoring the flag soup below, the first 
 is without invokedynamic, second is with and third is with bimorphic inline 
 for selectAlternative.

Here's a dumb question...couldn't the bimorphic logic have simply been
generated the second way you indicated, rather than the (x ? a :
b).whatever form?

Great to hear this is in! I'll look for an MLVM patch and until then
see if I can figure out why the classic GWT logic isn't as fast as
it was a couple weeks ago. IF I'm feeling saucy I may try to pull MLVM
+ bsd-port from two weeks ago (or find a macosx port build from that
timeframe) to compare assembly output. Perhaps illustrative for us
all!

 The changes are at http://javaweb.sfbay.sun.com/~never/webrev/bim.  Maybe 
 John can put together an mlvm patch for it.  I'm not quite sure how to do so.

 There are still some minor improvements that can be made.  The test part ends 
 up looking like:

  if ((test ? Boolean.TRUE : Boolean.FALSE).value())

 We need to treat more of those values as constants and split the loads up to 
 fold it away.

I'm keen to have a look at the assembly resulting from the bimorphic
inlining and see if there's any way to improve it further. My test
could potentially be improved too.

I'm also pondering ways I can go mostly guard-free for certain core
JRuby types, possibly reducing the guard to an exact type check.
There's definitely runway both on your end and my end.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-24 Thread Tom Rodriguez
I haven't been able to reproduce a crash that looks like either of the issues 
you reported, the ShouldNotReachHere in frame.cpp and the crash in 
FastScanClosure.  Do you have a test that can reproduce it and are you running 
against the latest version of the code?

tom

On May 23, 2011, at 7:33 PM, Ola Bini wrote:

 Hi,
 
 I'm happy to see that the performance degradation is getting addressed.
 I would like to point out that there is still a serious crash in the
 machinery too... Have you seen any reason why that happens?
 
 Cheers
 
 
 On 2011-05-24 06.11, John Rose wrote:
 On May 23, 2011, at 3:44 PM, Tom Rodriguez wrote:
 
 I'd *love* for intermediate static Java snippits like this to
 inline straight through, even if invokeExact would be megamorphic
 under normal circumstances...but I don't think that's the case
 right now, right?
 
 I haven't been following 292 that closely but it used to be a piece
 of code that did precisely that.
 
 +private Object invoke_L0() throws Throwable { +
 if ((boolean) test.invokeExact()) +return
 target.invokeExact(); +return fallback.invokeExact(); +
 }
 
 It looks like it was reworked at some point to use
 selectAlternative but the optimizer was never updated to deal with
 it properly.  It's not particularly hard, we just need to generate
 code like we would for a bimorphic call site.  The current code
 expects to either get a constant or something else and doesn't
 inline if it's something else.  In this case we have a Phi of two
 constants which we just need to split.
 
 Yes, it would be a quasi-bimorphic call site keyed from a phi of two
 constants, instead of a profile of two receiver classes.
 
 I'm still unclear why you couldn't write your own variant of
 guardWithTest and have it work but my knowledge of what's really
 allowed is somewhat limited.
 
 Those snippets will inline (I think), but at some point Charlie will
 want to fetch a bit of constant stuff out of an instance variable.
 That will look non-constant to the optimizer, even if the instance
 variable is final and the enclosing object is a constant reference.
 We made sure this happens for java.lang.invoke classes, but we
 haven't extended it yet to user code, in part because it would have
 its own bug tail to work through.
 
 -- John ___ mlvm-dev
 mailing list mlvm-dev@openjdk.java.net 
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
 
 
 
 -- 
 Ola Bini (http://olabini.com)
  Ioke - JRuby - ThoughtWorks
 
 Yields falsehood when quined yields falsehood when quined.
 
 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-24 Thread John Rose
On May 23, 2011, at 5:43 PM, Charles Oliver Nutter wrote:

 Well I have a prototype GWT in place, but I'm having trouble getting
 invokeExact to work. As far as I can tell, incoming arguments should
 match the handle type just fine, but I'm getting this error:

Hmm...  I don't see the bug in your diff.  Looks like someone is trying to 
invoke a MH in varargs style (Object...).  The error says 
InvokeDynamicSupport.java:934:in `gwt': but I don't know where this might be.

Anyway, I put GWT back the way it was, for (a) low arity and (b) method types 
without primitives.

(For high arity and/or types with primitives, it still uses ricochet frames, 
for which Tom found an optimization opportunity.)

See if it makes things better for you...

-- John___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-24 Thread Ola Bini
On 2011-05-25 10.50, John Rose wrote:
 On May 23, 2011, at 7:33 PM, Ola Bini wrote:
 
 Have you seen any reason why that happens?
 
 There are some known problems (crashers even) with method handle
 compilation.
 
 The permuteArguments transform has a bug when you mix long/doubles
 and other argument types.
 
 I have a crasher in one of my dispatch pattern examples that is
 similar to some of the code you have been talking about.
 
 Yes, there is at least one gremlin left in there.
 
 Thanks for posting the detailed instructions for reproducing your
 errors!

Thanks!

I don't use permuteArguments anywhere in my code, actually, but maybe
one of the other combinators is defined in terms of it?

Cheers
-- 
 Ola Bini (http://olabini.com)
  Ioke - JRuby - ThoughtWorks

 Yields falsehood when quined yields falsehood when quined.

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-24 Thread John Rose
On May 24, 2011, at 10:25 PM, Ola Bini wrote:

 I don't use permuteArguments anywhere in my code, actually, but maybe
 one of the other combinators is defined in terms of it?

No, it's stand-alone.  Factoid: All the other combinators are order-stable with 
respect to arguments.
I toyed with using it to normalize complex calling sequences (prim/ref mixes) 
but never got anywhere.

-- John
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-23 Thread Christian Thalinger
On May 21, 2011, at 1:31 AM, Charles Oliver Nutter wrote:
 I am still using the specialized math callsites because they inline well and 
 an Indy-based approach was still quite a bit slower last time I tried.
 
 I am back to being worried about the inlining thresholds. With no MHs and 
 runtime profiled direct calls, I was able to get this all to inline. So 
 basically the fact that I am using Indy and MHs now penalizes me over not 
 just profiled direct calls, but over a totally generic inline cache that 
 usually doesn't inline at all. That's pretty bad :)

I was on vacation for a couple of days but now I'm back.

I looked at the inlining tree for fib_recursive and I can't see the fib_ruby 
method getting compiled.  But instead I see something I've never seen before:

   2949  105 org.jruby.ast.CallOneArgNode::interpret (34 bytes)

or

   2997  108   ! org.jruby.evaluator.ASTInterpreter::INTERPRET_METHOD 
(126 bytes)

Does that mean JRuby doesn't compile these methods but interpret them?

-- Christian

 
 I can try fiddling with the thresholds later, but do we agree that an 
 invokedynamic bound to a GWT plus some intermediate adapter handles ending in 
 a direct handle should inline as well as e.g. an invokevirtual?
 
 - Charlie (mobile)
 
 On May 20, 2011, at 17:35, Rémi Forax fo...@univ-mlv.fr wrote:
 
 On 05/20/2011 11:08 PM, Tom Rodriguez wrote:
 Is it the inlining policy?  I played a little with this and the current 
 default policy is still not as good as the big tweaks to MaxInlineSize and 
 InlineSmallCode.
 
 tom
 
 I tend to agree,
 things like isFixnumReopened and getCacheToken seem to be not inlined.
 
 Charles, do you have tested if the inline flags are tweaked ?
 
 Also I do not understand why you have JRuby's callsites like MinusCallSite
 in the stacktrace. You should not use it if invokedynamic is enabled.
 
 Rémi
 
 On May 20, 2011, at 1:21 PM, Charles Oliver Nutter wrote:
 
 Meh, false alarm. This is just the default empty Object[] assigned to
 all objects. I still can't explain the perf degradation.
 
 Here's the full assembly for fib_ruby: https://gist.github.com/983721
 
 It seems like stuff is inlining, so I have no idea why performance is
 so terrible.
 
 - Charlie
 
 On Fri, May 20, 2011 at 2:59 PM, Charles Oliver Nutter
 head...@headius.com  wrote:
 Another update...I managed to turn off the ricochet frames and have it
 still work (???). The with-ricochet mode is faster, but still 2-3x
 slower than no indy at all (using JRuby's IC), and 4-5x slower than
 invokedynamic was recently...
 
 HOWEVER...an assembly dump I just did contains some very odd code that
 might indicate a problem in my code. Hold the presses on digging into
 it on your end until I can explain this. (this = apparently standing
 up an instance variable table for Fixnum objects...*definitely* should
 not be happening).
 
 0x02868f2a: mov [eax+0x1C], ebx  ;*invokespecialinit
   ; -
 org.jruby.RubyBasicObject::init@1 (line 218)
   ; -
 org.jruby.RubyObject::init@2 (line 116)
   ; -
 org.jruby.RubyNumeric::init@2 (line 114)
   ; -
 org.jruby.RubyInteger::init@2 (line 95)
   ; -
 org.jruby.RubyFixnum::init@5 (line 112)
   ; -
 org.jruby.RubyFixnum::newFixnum@25 (line 173)
   ; -
 org.jruby.RubyFixnum::op_minus@34 (line 395)
   ; -
 org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16)
 0x02868f2d: mov   ebx, [esp+0x14]
 0x02868f31: mov   ecx, [ebx+0xBC]
 0x02868f37: mov   edx, eax
 0x02868f39: shr   edx, 9
 0x02868f3c: mov   [edx+0x394000], 0x00  ;*putfield varTable
   ; -
 org.jruby.RubyBasicObject::init@8 (line 84)
   ; -
 org.jruby.RubyObject::init@2 (line 116)
   ; -
 org.jruby.RubyNumeric::init@2 (line 114)
   ; -
 org.jruby.RubyInteger::init@2 (line 95)
   ; -
 org.jruby.RubyFixnum::init@5 (line 112)
   ; -
 org.jruby.RubyFixnum::newFixnum@25 (line 173)
   ; -
 org.jruby.RubyFixnum::op_minus@34 (line 395)
   ; -
 org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16)
 0x02868f43: mov   edx, a 'java/lang/Class' = 
 'org/jruby/RubyBasicObject'
   ;   {oop(a 'java/lang/Class' =
 'org/jruby/RubyBasicObject')}
 0x02868f48: mov   ebx, [edx+0x68]  ;*getstatic NULL_OBJECT_ARRAY
   ; -
 org.jruby.RubyBasicObject::init@5 (line 84)
   ; -
 org.jruby.RubyObject::init@2 (line 116)
  

Re: Good news, bad news

2011-05-23 Thread Charles Oliver Nutter
On Mon, May 23, 2011 at 11:05 AM, Christian Thalinger
christian.thalin...@oracle.com wrote:
 I was on vacation for a couple of days but now I'm back.

 I looked at the inlining tree for fib_recursive and I can't see the fib_ruby 
 method getting compiled.  But instead I see something I've never seen before:

   2949  105             org.jruby.ast.CallOneArgNode::interpret (34 bytes)

 or

   2997  108   !         org.jruby.evaluator.ASTInterpreter::INTERPRET_METHOD 
 (126 bytes)

 Does that mean JRuby doesn't compile these methods but interpret them?

Hmm, I wonder if there's a problem somewhere preventing JIT now. Run
JRuby with -d -X+C to debug logging and force compilation. I will look
into it today myself.

I did see fib_ruby compile and was able to get assembly dumps out, but
it performed terribly. I'll try to make extra sure everything's
jitting.

Ahh the joys of a mixed-mode runtime, eh?

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-23 Thread John Rose
On May 23, 2011, at 11:45 AM, Charles Oliver Nutter wrote:

 I did see fib_ruby compile and was able to get assembly dumps out, but
 it performed terribly. I'll try to make extra sure everything's
 jitting.
 
 Ahh the joys of a mixed-mode runtime, eh?

Indeed.  Anyway, thanks for pushing on this.

Sometimes you can get your dumps faster by using -XX:-DisplayVMOutput along 
with -XX:+PrintAssembly.  The output goes to hotspot.log anyway.

Also, -XX:CompileCommand=print,foo::bar will focus assembly printing on the 
method(s) you care about.

-- John
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-23 Thread Christian Thalinger
On May 23, 2011, at 8:45 PM, Charles Oliver Nutter wrote:
 On Mon, May 23, 2011 at 11:05 AM, Christian Thalinger
 christian.thalin...@oracle.com wrote:
 I was on vacation for a couple of days but now I'm back.
 
 I looked at the inlining tree for fib_recursive and I can't see the fib_ruby 
 method getting compiled.  But instead I see something I've never seen before:
 
   2949  105 org.jruby.ast.CallOneArgNode::interpret (34 bytes)
 
 or
 
   2997  108   ! org.jruby.evaluator.ASTInterpreter::INTERPRET_METHOD 
 (126 bytes)
 
 Does that mean JRuby doesn't compile these methods but interpret them?
 
 Hmm, I wonder if there's a problem somewhere preventing JIT now. Run
 JRuby with -d -X+C to debug logging and force compilation. I will look
 into it today myself.

I didn't use -X+C but it should compile without it too, right?  And, btw, I 
used JDK 7 b143.  I try again tomorrow.

-- Christian

 
 I did see fib_ruby compile and was able to get assembly dumps out, but
 it performed terribly. I'll try to make extra sure everything's
 jitting.
 
 Ahh the joys of a mixed-mode runtime, eh?
 
 - Charlie
 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-23 Thread John Rose
On May 23, 2011, at 2:19 PM, Charles Oliver Nutter wrote:

 I'm working up a set of files that show JRuby compilation output, but
 I noticed a couple things that might be interesting right now.

That looks promising.  It seems something got bumped in our inlining logic.  -- 
John

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-23 Thread Charles Oliver Nutter
fib_ruby LogCompilation inlining graph, showing that fib_ruby is not
inlined: https://gist.github.com/f2b665ad3c97ba622ebf

Can anyone suggest other flags I can try to adjust to get things to
inline better?

FWIW, the handle chain in question that's not inlining is pretty simple:

* DMH pointing back at fib_ruby
* permute args
* GWT

- Charlie

On Mon, May 23, 2011 at 4:19 PM, Charles Oliver Nutter
head...@headius.com wrote:
 I'm working up a set of files that show JRuby compilation output, but
 I noticed a couple things that might be interesting right now.

 First off, fairly early in the assembly output for fib, I see this:

  0x02876d1f: call      0x0282d0e0      ; OopMap{[96]=Oop [100]=Oop
 [28]=Oop [40]=Oop [48]=Oop off=644}
                                        ;*invokespecial invokeExact
                                        ; -
 java.lang.invoke.MethodHandle::invokeExact@63
                                        ; -
 java.lang.invoke.MethodHandle::invokeExact@23
                                        ; -
 bench.bench_fib_recursive::method__0$RUBY$fib_ruby@51 (line 7)
                                        ;   {optimized virtual_call}

 For fib, the only invokedynamic is the recursive call to fib, so that
 would indicate that fib_ruby is not inlining into itself at all here.
 And I can't see it inlining into itself anywhere in the assembly
 output.

 Later in the same output:

  0x0287703f: call      0x0282dba0      ; OopMap{ebp=Oop off=1444}
                                        ;*checkcast
                                        ; -
 java.lang.invoke.MethodHandle::invokeExact@40
                                        ; -
 bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7)
                                        ;   {runtime_call}
  0x02877044: call      0x0105a9d0      ;*checkcast
                                        ; -
 java.lang.invoke.MethodHandle::invokeExact@40
                                        ; -
 bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7)
                                        ;   {runtime_call}

 These appear repeatedly near the invokedynamic invocation above. If
 I'm reading this right, neither the recursive call nor logic involved
 in that particular handle is inlining. Am I right?

 Here's the complete assembly dump (i386) for the fib_ruby method:
 https://gist.github.com/987640

 In other news, MaxInlineSize=150 with InlineSmallCode=3000 does not
 appear to improve performance. I also tried bumping up
 MaxRecursiveInlineLevel and MaxInlineLevel with no effect.

 - Charlie

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-23 Thread Tom Rodriguez
If there were to be a recursive inline in there, where would it occur?  I can't 
tell from the names where in that inline tree where the recursive call occurs.

tom

On May 23, 2011, at 2:26 PM, Charles Oliver Nutter wrote:

 fib_ruby LogCompilation inlining graph, showing that fib_ruby is not
 inlined: https://gist.github.com/f2b665ad3c97ba622ebf
 
 Can anyone suggest other flags I can try to adjust to get things to
 inline better?
 
 FWIW, the handle chain in question that's not inlining is pretty simple:
 
 * DMH pointing back at fib_ruby
 * permute args
 * GWT
 
 - Charlie
 
 On Mon, May 23, 2011 at 4:19 PM, Charles Oliver Nutter
 head...@headius.com wrote:
 I'm working up a set of files that show JRuby compilation output, but
 I noticed a couple things that might be interesting right now.
 
 First off, fairly early in the assembly output for fib, I see this:
 
  0x02876d1f: call  0x0282d0e0  ; OopMap{[96]=Oop [100]=Oop
 [28]=Oop [40]=Oop [48]=Oop off=644}
;*invokespecial invokeExact
; -
 java.lang.invoke.MethodHandle::invokeExact@63
; -
 java.lang.invoke.MethodHandle::invokeExact@23
; -
 bench.bench_fib_recursive::method__0$RUBY$fib_ruby@51 (line 7)
;   {optimized virtual_call}
 
 For fib, the only invokedynamic is the recursive call to fib, so that
 would indicate that fib_ruby is not inlining into itself at all here.
 And I can't see it inlining into itself anywhere in the assembly
 output.
 
 Later in the same output:
 
  0x0287703f: call  0x0282dba0  ; OopMap{ebp=Oop off=1444}
;*checkcast
; -
 java.lang.invoke.MethodHandle::invokeExact@40
; -
 bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7)
;   {runtime_call}
  0x02877044: call  0x0105a9d0  ;*checkcast
; -
 java.lang.invoke.MethodHandle::invokeExact@40
; -
 bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7)
;   {runtime_call}
 
 These appear repeatedly near the invokedynamic invocation above. If
 I'm reading this right, neither the recursive call nor logic involved
 in that particular handle is inlining. Am I right?
 
 Here's the complete assembly dump (i386) for the fib_ruby method:
 https://gist.github.com/987640
 
 In other news, MaxInlineSize=150 with InlineSmallCode=3000 does not
 appear to improve performance. I also tried bumping up
 MaxRecursiveInlineLevel and MaxInlineLevel with no effect.
 
 - Charlie
 
 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-23 Thread John Rose
On May 23, 2011, at 2:26 PM, Charles Oliver Nutter wrote:

 * GWT

GWT changed, but in a way that should make it easier (not harder) to inline.  
That could be the problem.

The new format (using ricochet frames) is:

  GWT(p,t,f)(*a) := ( selectAlternative(p(*a), t, f) ).invokeExact(*a)
where selectAlternative(z,a,b) := z?a:b

Or:
  GWT(p,t,f) := foldArguments(tailcall, folder)
where
  tailcall := exactInvoker(t.type())
  chooser := selectAlternative.insertArguments(1,t,f)
  folder := filterReturnValue(p, chooser)

The new MHW code should allow the foldArguments node to be inlined.  But that 
may be the weakest link.

-- John
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-23 Thread Charles Oliver Nutter
The following chunk should be the invokedynamic call to fib, via a
GWT, an arg permuter, and perhaps one convert:

@ 77 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
@ 77 java.lang.invoke.MethodHandle::invokeExact (44 bytes)
  @ 8 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
  @ 8 java.lang.invoke.MethodHandle::invokeExact (7 bytes)
@ 3 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::test
(20 bytes)
  @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes)
  @ 8 org.jruby.RubyModule::getCacheToken (5 bytes)
  @ 23 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
  @ 23 java.lang.invoke.MethodHandle::invokeExact (67 bytes)
@ 1 java.lang.Boolean::valueOf (14 bytes)
@ 10 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
@ 10 java.lang.invoke.MethodHandle::invokeExact (24 bytes)
  @ 11 java.lang.Boolean::booleanValue (5 bytes)
  @ 20 java.lang.invoke.MethodHandleImpl::selectAlternative (10 bytes)
@ 63 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
  @ 37 sun.invoke.util.ValueConversions::identity (2 bytes)

This seems to only be the test logic; the actual fib invocation
doesn't appear to show up in the inlining graph at all. Am I right?

I see two of these in the LogCompilation output and nothing else
around them. I'd expect to see them do the invocation of fib_ruby
somewhere in there. It's like the success branch of GWT is not even
being considered for inlining.

- Charlie

On Mon, May 23, 2011 at 4:41 PM, Tom Rodriguez tom.rodrig...@oracle.com wrote:
 If there were to be a recursive inline in there, where would it occur?  I 
 can't tell from the names where in that inline tree where the recursive call 
 occurs.

 tom

 On May 23, 2011, at 2:26 PM, Charles Oliver Nutter wrote:

 fib_ruby LogCompilation inlining graph, showing that fib_ruby is not
 inlined: https://gist.github.com/f2b665ad3c97ba622ebf

 Can anyone suggest other flags I can try to adjust to get things to
 inline better?

 FWIW, the handle chain in question that's not inlining is pretty simple:

 * DMH pointing back at fib_ruby
 * permute args
 * GWT

 - Charlie

 On Mon, May 23, 2011 at 4:19 PM, Charles Oliver Nutter
 head...@headius.com wrote:
 I'm working up a set of files that show JRuby compilation output, but
 I noticed a couple things that might be interesting right now.

 First off, fairly early in the assembly output for fib, I see this:

  0x02876d1f: call      0x0282d0e0      ; OopMap{[96]=Oop [100]=Oop
 [28]=Oop [40]=Oop [48]=Oop off=644}
                                        ;*invokespecial invokeExact
                                        ; -
 java.lang.invoke.MethodHandle::invokeExact@63
                                        ; -
 java.lang.invoke.MethodHandle::invokeExact@23
                                        ; -
 bench.bench_fib_recursive::method__0$RUBY$fib_ruby@51 (line 7)
                                        ;   {optimized virtual_call}

 For fib, the only invokedynamic is the recursive call to fib, so that
 would indicate that fib_ruby is not inlining into itself at all here.
 And I can't see it inlining into itself anywhere in the assembly
 output.

 Later in the same output:

  0x0287703f: call      0x0282dba0      ; OopMap{ebp=Oop off=1444}
                                        ;*checkcast
                                        ; -
 java.lang.invoke.MethodHandle::invokeExact@40
                                        ; -
 bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7)
                                        ;   {runtime_call}
  0x02877044: call      0x0105a9d0      ;*checkcast
                                        ; -
 java.lang.invoke.MethodHandle::invokeExact@40
                                        ; -
 bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7)
                                        ;   {runtime_call}

 These appear repeatedly near the invokedynamic invocation above. If
 I'm reading this right, neither the recursive call nor logic involved
 in that particular handle is inlining. Am I right?

 Here's the complete assembly dump (i386) for the fib_ruby method:
 https://gist.github.com/987640

 In other news, MaxInlineSize=150 with InlineSmallCode=3000 does not
 appear to improve performance. I also tried bumping up
 MaxRecursiveInlineLevel and MaxInlineLevel with no effect.

 - Charlie

 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-23 Thread Charles Oliver Nutter
Also, fwiw...after these two chunks in LogCompilation output, I see
nothing else inlined into fib_ruby, including a monomorphic call path
through PlusCallSite ending at RubyFixnum#op_plus (the integer +
operation). That would also affect performance.

I also do not see any indication *why* nothing inlines past this
point. Usually it would say too big or something.

I do see MinusCallSite inline earlier.

- Charlie

On Mon, May 23, 2011 at 4:47 PM, Charles Oliver Nutter
head...@headius.com wrote:
 The following chunk should be the invokedynamic call to fib, via a
 GWT, an arg permuter, and perhaps one convert:

    @ 77 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
    @ 77 java.lang.invoke.MethodHandle::invokeExact (44 bytes)
      @ 8 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
      @ 8 java.lang.invoke.MethodHandle::invokeExact (7 bytes)
        @ 3 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::test
 (20 bytes)
          @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes)
          @ 8 org.jruby.RubyModule::getCacheToken (5 bytes)
      @ 23 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
      @ 23 java.lang.invoke.MethodHandle::invokeExact (67 bytes)
        @ 1 java.lang.Boolean::valueOf (14 bytes)
        @ 10 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
        @ 10 java.lang.invoke.MethodHandle::invokeExact (24 bytes)
          @ 11 java.lang.Boolean::booleanValue (5 bytes)
          @ 20 java.lang.invoke.MethodHandleImpl::selectAlternative (10 bytes)
        @ 63 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
      @ 37 sun.invoke.util.ValueConversions::identity (2 bytes)

 This seems to only be the test logic; the actual fib invocation
 doesn't appear to show up in the inlining graph at all. Am I right?

 I see two of these in the LogCompilation output and nothing else
 around them. I'd expect to see them do the invocation of fib_ruby
 somewhere in there. It's like the success branch of GWT is not even
 being considered for inlining.

 - Charlie

 On Mon, May 23, 2011 at 4:41 PM, Tom Rodriguez tom.rodrig...@oracle.com 
 wrote:
 If there were to be a recursive inline in there, where would it occur?  I 
 can't tell from the names where in that inline tree where the recursive call 
 occurs.

 tom

 On May 23, 2011, at 2:26 PM, Charles Oliver Nutter wrote:

 fib_ruby LogCompilation inlining graph, showing that fib_ruby is not
 inlined: https://gist.github.com/f2b665ad3c97ba622ebf

 Can anyone suggest other flags I can try to adjust to get things to
 inline better?

 FWIW, the handle chain in question that's not inlining is pretty simple:

 * DMH pointing back at fib_ruby
 * permute args
 * GWT

 - Charlie

 On Mon, May 23, 2011 at 4:19 PM, Charles Oliver Nutter
 head...@headius.com wrote:
 I'm working up a set of files that show JRuby compilation output, but
 I noticed a couple things that might be interesting right now.

 First off, fairly early in the assembly output for fib, I see this:

  0x02876d1f: call      0x0282d0e0      ; OopMap{[96]=Oop [100]=Oop
 [28]=Oop [40]=Oop [48]=Oop off=644}
                                        ;*invokespecial invokeExact
                                        ; -
 java.lang.invoke.MethodHandle::invokeExact@63
                                        ; -
 java.lang.invoke.MethodHandle::invokeExact@23
                                        ; -
 bench.bench_fib_recursive::method__0$RUBY$fib_ruby@51 (line 7)
                                        ;   {optimized virtual_call}

 For fib, the only invokedynamic is the recursive call to fib, so that
 would indicate that fib_ruby is not inlining into itself at all here.
 And I can't see it inlining into itself anywhere in the assembly
 output.

 Later in the same output:

  0x0287703f: call      0x0282dba0      ; OopMap{ebp=Oop off=1444}
                                        ;*checkcast
                                        ; -
 java.lang.invoke.MethodHandle::invokeExact@40
                                        ; -
 bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7)
                                        ;   {runtime_call}
  0x02877044: call      0x0105a9d0      ;*checkcast
                                        ; -
 java.lang.invoke.MethodHandle::invokeExact@40
                                        ; -
 bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7)
                                        ;   {runtime_call}

 These appear repeatedly near the invokedynamic invocation above. If
 I'm reading this right, neither the recursive call nor logic involved
 in that particular handle is inlining. Am I right?

 Here's the complete assembly dump (i386) for the fib_ruby method:
 https://gist.github.com/987640

 In other news, MaxInlineSize=150 with InlineSmallCode=3000 does not
 appear to improve performance. I also tried bumping up
 MaxRecursiveInlineLevel and MaxInlineLevel with no effect.

 - Charlie

 

Re: Good news, bad news

2011-05-23 Thread Charles Oliver Nutter
Another example, running bench/language/bench_method_dispatch_only,
which runs a 1m iteration loop that invokes an empty foo method five
times:

https://gist.github.com/9008f94fc677f3fe98e7

Note again that it seems like only the test logic and maybe some of
the logic wrapping the foo call inline...the foo calls themselves do
not appear in logc inlining graph at all.

- Charlie

On Mon, May 23, 2011 at 4:50 PM, Charles Oliver Nutter
head...@headius.com wrote:
 Also, fwiw...after these two chunks in LogCompilation output, I see
 nothing else inlined into fib_ruby, including a monomorphic call path
 through PlusCallSite ending at RubyFixnum#op_plus (the integer +
 operation). That would also affect performance.

 I also do not see any indication *why* nothing inlines past this
 point. Usually it would say too big or something.

 I do see MinusCallSite inline earlier.

 - Charlie

 On Mon, May 23, 2011 at 4:47 PM, Charles Oliver Nutter
 head...@headius.com wrote:
 The following chunk should be the invokedynamic call to fib, via a
 GWT, an arg permuter, and perhaps one convert:

    @ 77 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
    @ 77 java.lang.invoke.MethodHandle::invokeExact (44 bytes)
      @ 8 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
      @ 8 java.lang.invoke.MethodHandle::invokeExact (7 bytes)
        @ 3 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::test
 (20 bytes)
          @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes)
          @ 8 org.jruby.RubyModule::getCacheToken (5 bytes)
      @ 23 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
      @ 23 java.lang.invoke.MethodHandle::invokeExact (67 bytes)
        @ 1 java.lang.Boolean::valueOf (14 bytes)
        @ 10 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
        @ 10 java.lang.invoke.MethodHandle::invokeExact (24 bytes)
          @ 11 java.lang.Boolean::booleanValue (5 bytes)
          @ 20 java.lang.invoke.MethodHandleImpl::selectAlternative (10 bytes)
        @ 63 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
      @ 37 sun.invoke.util.ValueConversions::identity (2 bytes)

 This seems to only be the test logic; the actual fib invocation
 doesn't appear to show up in the inlining graph at all. Am I right?

 I see two of these in the LogCompilation output and nothing else
 around them. I'd expect to see them do the invocation of fib_ruby
 somewhere in there. It's like the success branch of GWT is not even
 being considered for inlining.

 - Charlie

 On Mon, May 23, 2011 at 4:41 PM, Tom Rodriguez tom.rodrig...@oracle.com 
 wrote:
 If there were to be a recursive inline in there, where would it occur?  I 
 can't tell from the names where in that inline tree where the recursive 
 call occurs.

 tom

 On May 23, 2011, at 2:26 PM, Charles Oliver Nutter wrote:

 fib_ruby LogCompilation inlining graph, showing that fib_ruby is not
 inlined: https://gist.github.com/f2b665ad3c97ba622ebf

 Can anyone suggest other flags I can try to adjust to get things to
 inline better?

 FWIW, the handle chain in question that's not inlining is pretty simple:

 * DMH pointing back at fib_ruby
 * permute args
 * GWT

 - Charlie

 On Mon, May 23, 2011 at 4:19 PM, Charles Oliver Nutter
 head...@headius.com wrote:
 I'm working up a set of files that show JRuby compilation output, but
 I noticed a couple things that might be interesting right now.

 First off, fairly early in the assembly output for fib, I see this:

  0x02876d1f: call      0x0282d0e0      ; OopMap{[96]=Oop [100]=Oop
 [28]=Oop [40]=Oop [48]=Oop off=644}
                                        ;*invokespecial invokeExact
                                        ; -
 java.lang.invoke.MethodHandle::invokeExact@63
                                        ; -
 java.lang.invoke.MethodHandle::invokeExact@23
                                        ; -
 bench.bench_fib_recursive::method__0$RUBY$fib_ruby@51 (line 7)
                                        ;   {optimized virtual_call}

 For fib, the only invokedynamic is the recursive call to fib, so that
 would indicate that fib_ruby is not inlining into itself at all here.
 And I can't see it inlining into itself anywhere in the assembly
 output.

 Later in the same output:

  0x0287703f: call      0x0282dba0      ; OopMap{ebp=Oop off=1444}
                                        ;*checkcast
                                        ; -
 java.lang.invoke.MethodHandle::invokeExact@40
                                        ; -
 bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7)
                                        ;   {runtime_call}
  0x02877044: call      0x0105a9d0      ;*checkcast
                                        ; -
 java.lang.invoke.MethodHandle::invokeExact@40
                                        ; -
 bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7)
                                        ;   {runtime_call}

 These appear repeatedly near the invokedynamic invocation 

Re: Good news, bad news

2011-05-23 Thread Charles Oliver Nutter
FWIW, perf with indy versus monomorphic inline caching on that
bench_method_dispatch_only benchmark:

~/projects/jruby ➔ jruby --server -X+C
bench/language/bench_method_dispatch_only.rbTest ruby method: 1000k
loops calling self's foo 10 times
  1.129000   0.00   1.129000 (  0.662000)
  0.409000   0.00   0.409000 (  0.409000)
  0.455000   0.00   0.455000 (  0.455000)
  0.428000   0.00   0.428000 (  0.428000)
  0.474000   0.00   0.474000 (  0.474000)
  0.47   0.00   0.47 (  0.47)
  0.458000   0.00   0.458000 (  0.458000)
  0.495000   0.00   0.495000 (  0.495000)
  0.46   0.00   0.46 (  0.46)
  0.508000   0.00   0.508000 (  0.508000)

~/projects/jruby ➔ jruby --server -Xcompile.invokedynamic=false -X+C
bench/language/bench_method_dispatch_only.rb
Test ruby method: 1000k loops calling self's foo 10 times
  0.377000   0.00   0.377000 (  0.315000)
  0.211000   0.00   0.211000 (  0.207000)
  0.132000   0.00   0.132000 (  0.132000)
  0.128000   0.00   0.128000 (  0.128000)
  0.135000   0.00   0.135000 (  0.135000)
  0.14   0.00   0.14 (  0.14)
  0.122000   0.00   0.122000 (  0.122000)
  0.122000   0.00   0.122000 (  0.122000)
  0.122000   0.00   0.122000 (  0.122000)
  0.122000   0.00   0.122000 (  0.122000)

Previously, invokedynamic version clocked in *much* faster than the
MIC version...like an order of magnitude faster.

- Charlie

On Mon, May 23, 2011 at 4:56 PM, Charles Oliver Nutter
head...@headius.com wrote:
 Another example, running bench/language/bench_method_dispatch_only,
 which runs a 1m iteration loop that invokes an empty foo method five
 times:

 https://gist.github.com/9008f94fc677f3fe98e7

 Note again that it seems like only the test logic and maybe some of
 the logic wrapping the foo call inline...the foo calls themselves do
 not appear in logc inlining graph at all.

 - Charlie

 On Mon, May 23, 2011 at 4:50 PM, Charles Oliver Nutter
 head...@headius.com wrote:
 Also, fwiw...after these two chunks in LogCompilation output, I see
 nothing else inlined into fib_ruby, including a monomorphic call path
 through PlusCallSite ending at RubyFixnum#op_plus (the integer +
 operation). That would also affect performance.

 I also do not see any indication *why* nothing inlines past this
 point. Usually it would say too big or something.

 I do see MinusCallSite inline earlier.

 - Charlie

 On Mon, May 23, 2011 at 4:47 PM, Charles Oliver Nutter
 head...@headius.com wrote:
 The following chunk should be the invokedynamic call to fib, via a
 GWT, an arg permuter, and perhaps one convert:

    @ 77 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
    @ 77 java.lang.invoke.MethodHandle::invokeExact (44 bytes)
      @ 8 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
      @ 8 java.lang.invoke.MethodHandle::invokeExact (7 bytes)
        @ 3 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::test
 (20 bytes)
          @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes)
          @ 8 org.jruby.RubyModule::getCacheToken (5 bytes)
      @ 23 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
      @ 23 java.lang.invoke.MethodHandle::invokeExact (67 bytes)
        @ 1 java.lang.Boolean::valueOf (14 bytes)
        @ 10 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
        @ 10 java.lang.invoke.MethodHandle::invokeExact (24 bytes)
          @ 11 java.lang.Boolean::booleanValue (5 bytes)
          @ 20 java.lang.invoke.MethodHandleImpl::selectAlternative (10 
 bytes)
        @ 63 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
      @ 37 sun.invoke.util.ValueConversions::identity (2 bytes)

 This seems to only be the test logic; the actual fib invocation
 doesn't appear to show up in the inlining graph at all. Am I right?

 I see two of these in the LogCompilation output and nothing else
 around them. I'd expect to see them do the invocation of fib_ruby
 somewhere in there. It's like the success branch of GWT is not even
 being considered for inlining.

 - Charlie

 On Mon, May 23, 2011 at 4:41 PM, Tom Rodriguez tom.rodrig...@oracle.com 
 wrote:
 If there were to be a recursive inline in there, where would it occur?  I 
 can't tell from the names where in that inline tree where the recursive 
 call occurs.

 tom

 On May 23, 2011, at 2:26 PM, Charles Oliver Nutter wrote:

 fib_ruby LogCompilation inlining graph, showing that fib_ruby is not
 inlined: https://gist.github.com/f2b665ad3c97ba622ebf

 Can anyone suggest other flags I can try to adjust to get things to
 inline better?

 FWIW, the handle chain in question that's not inlining is pretty simple:

 * DMH pointing back at fib_ruby
 * permute args
 * GWT

 - Charlie

 On Mon, May 23, 2011 at 4:19 PM, Charles Oliver Nutter
 head...@headius.com wrote:
 I'm working up a set of files that show JRuby compilation output, but
 I noticed a couple things that might be interesting right now.

 First off, fairly early in the 

Re: Good news, bad news

2011-05-23 Thread Tom Rodriguez

On May 23, 2011, at 2:50 PM, Charles Oliver Nutter wrote:

 Also, fwiw...after these two chunks in LogCompilation output, I see
 nothing else inlined into fib_ruby, including a monomorphic call path
 through PlusCallSite ending at RubyFixnum#op_plus (the integer +
 operation). That would also affect performance.
 
 I also do not see any indication *why* nothing inlines past this
 point. Usually it would say too big or something.

If there's no message then it wasn't even a candidate for inlining, either 
because the path was trimmed away by the optimizer or there wasn't enough 
information to bind the callsite to a particular method.  It's probably the 
latter in this case.  Actually my guess is that the selectAlternative machinery 
is what's screwing it up.  That code gets emitted as:

(test ? a : b).invokeExact();

instead of

if (test) a.invokeExact(); else b.invokeExact();

Since the first one only uses a single call site we can't statically inline 
even if a and b are constants.  The optimizer could do that optimization but it 
currently doesn't.  Can you build the guard differently?  We'll need to build a 
new call site optimization to deal with the selectAlternative logic.

tom

 
 I do see MinusCallSite inline earlier.
 
 - Charlie
 
 On Mon, May 23, 2011 at 4:47 PM, Charles Oliver Nutter
 head...@headius.com wrote:
 The following chunk should be the invokedynamic call to fib, via a
 GWT, an arg permuter, and perhaps one convert:
 
@ 77 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
@ 77 java.lang.invoke.MethodHandle::invokeExact (44 bytes)
  @ 8 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
  @ 8 java.lang.invoke.MethodHandle::invokeExact (7 bytes)
@ 3 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::test
 (20 bytes)
  @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes)
  @ 8 org.jruby.RubyModule::getCacheToken (5 bytes)
  @ 23 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
  @ 23 java.lang.invoke.MethodHandle::invokeExact (67 bytes)
@ 1 java.lang.Boolean::valueOf (14 bytes)
@ 10 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
@ 10 java.lang.invoke.MethodHandle::invokeExact (24 bytes)
  @ 11 java.lang.Boolean::booleanValue (5 bytes)
  @ 20 java.lang.invoke.MethodHandleImpl::selectAlternative (10 bytes)
@ 63 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
  @ 37 sun.invoke.util.ValueConversions::identity (2 bytes)
 
 This seems to only be the test logic; the actual fib invocation
 doesn't appear to show up in the inlining graph at all. Am I right?
 
 I see two of these in the LogCompilation output and nothing else
 around them. I'd expect to see them do the invocation of fib_ruby
 somewhere in there. It's like the success branch of GWT is not even
 being considered for inlining.
 
 - Charlie
 
 On Mon, May 23, 2011 at 4:41 PM, Tom Rodriguez tom.rodrig...@oracle.com 
 wrote:
 If there were to be a recursive inline in there, where would it occur?  I 
 can't tell from the names where in that inline tree where the recursive 
 call occurs.
 
 tom
 
 On May 23, 2011, at 2:26 PM, Charles Oliver Nutter wrote:
 
 fib_ruby LogCompilation inlining graph, showing that fib_ruby is not
 inlined: https://gist.github.com/f2b665ad3c97ba622ebf
 
 Can anyone suggest other flags I can try to adjust to get things to
 inline better?
 
 FWIW, the handle chain in question that's not inlining is pretty simple:
 
 * DMH pointing back at fib_ruby
 * permute args
 * GWT
 
 - Charlie
 
 On Mon, May 23, 2011 at 4:19 PM, Charles Oliver Nutter
 head...@headius.com wrote:
 I'm working up a set of files that show JRuby compilation output, but
 I noticed a couple things that might be interesting right now.
 
 First off, fairly early in the assembly output for fib, I see this:
 
  0x02876d1f: call  0x0282d0e0  ; OopMap{[96]=Oop [100]=Oop
 [28]=Oop [40]=Oop [48]=Oop off=644}
;*invokespecial invokeExact
; -
 java.lang.invoke.MethodHandle::invokeExact@63
; -
 java.lang.invoke.MethodHandle::invokeExact@23
; -
 bench.bench_fib_recursive::method__0$RUBY$fib_ruby@51 (line 7)
;   {optimized virtual_call}
 
 For fib, the only invokedynamic is the recursive call to fib, so that
 would indicate that fib_ruby is not inlining into itself at all here.
 And I can't see it inlining into itself anywhere in the assembly
 output.
 
 Later in the same output:
 
  0x0287703f: call  0x0282dba0  ; OopMap{ebp=Oop off=1444}
;*checkcast
; -
 java.lang.invoke.MethodHandle::invokeExact@40
; -
 bench.bench_fib_recursive::method__0$RUBY$fib_ruby@82 (line 7)
; 

Re: Good news, bad news

2011-05-23 Thread Charles Oliver Nutter
On Mon, May 23, 2011 at 5:20 PM, Tom Rodriguez tom.rodrig...@oracle.com wrote:

 On May 23, 2011, at 2:50 PM, Charles Oliver Nutter wrote:
 I also do not see any indication *why* nothing inlines past this
 point. Usually it would say too big or something.

 If there's no message then it wasn't even a candidate for inlining, either 
 because the path was trimmed away by the optimizer or there wasn't enough 
 information to bind the callsite to a particular method.  It's probably the 
 latter in this case.

Good to know I'm reading this output right :) Is there another logging
flag that will log these situations too? I'd like to have that not
just for invokedynamic opto.

 Actually my guess is that the selectAlternative machinery is what's screwing 
 it up.  That code gets emitted as:

 (test ? a : b).invokeExact();

 instead of

 if (test) a.invokeExact(); else b.invokeExact();

 Since the first one only uses a single call site we can't statically inline 
 even if a and b are constants.  The optimizer could do that optimization but 
 it currently doesn't.  Can you build the guard differently?  We'll need to 
 build a new call site optimization to deal with the selectAlternative logic.

Is this a question for me? If so...

Build the guard differently as in don't use
MethodHandles.guardWithTest? I probably could, but it would require
inserting normal Java code into the call path, which should in
theory be megamorphic (since all invokedynamic everywhere will go
through that same piece of code). Something like:

IRubyObject myGuardWithTest(MethodHandle dmh, args...) {
  if (test(args) {
return dmh.invokeExact(args);
  } else {
inline cache
  }
}

I'd *love* for intermediate static Java snippits like this to inline
straight through, even if invokeExact would be megamorphic under
normal circumstances...but I don't think that's the case right now,
right?

- Charlie


 tom


 I do see MinusCallSite inline earlier.

 - Charlie

 On Mon, May 23, 2011 at 4:47 PM, Charles Oliver Nutter
 head...@headius.com wrote:
 The following chunk should be the invokedynamic call to fib, via a
 GWT, an arg permuter, and perhaps one convert:

    @ 77 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
    @ 77 java.lang.invoke.MethodHandle::invokeExact (44 bytes)
      @ 8 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
      @ 8 java.lang.invoke.MethodHandle::invokeExact (7 bytes)
        @ 3 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::test
 (20 bytes)
          @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes)
          @ 8 org.jruby.RubyModule::getCacheToken (5 bytes)
      @ 23 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
      @ 23 java.lang.invoke.MethodHandle::invokeExact (67 bytes)
        @ 1 java.lang.Boolean::valueOf (14 bytes)
        @ 10 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
        @ 10 java.lang.invoke.MethodHandle::invokeExact (24 bytes)
          @ 11 java.lang.Boolean::booleanValue (5 bytes)
          @ 20 java.lang.invoke.MethodHandleImpl::selectAlternative (10 
 bytes)
        @ 63 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
      @ 37 sun.invoke.util.ValueConversions::identity (2 bytes)

 This seems to only be the test logic; the actual fib invocation
 doesn't appear to show up in the inlining graph at all. Am I right?

 I see two of these in the LogCompilation output and nothing else
 around them. I'd expect to see them do the invocation of fib_ruby
 somewhere in there. It's like the success branch of GWT is not even
 being considered for inlining.

 - Charlie

 On Mon, May 23, 2011 at 4:41 PM, Tom Rodriguez tom.rodrig...@oracle.com 
 wrote:
 If there were to be a recursive inline in there, where would it occur?  I 
 can't tell from the names where in that inline tree where the recursive 
 call occurs.

 tom

 On May 23, 2011, at 2:26 PM, Charles Oliver Nutter wrote:

 fib_ruby LogCompilation inlining graph, showing that fib_ruby is not
 inlined: https://gist.github.com/f2b665ad3c97ba622ebf

 Can anyone suggest other flags I can try to adjust to get things to
 inline better?

 FWIW, the handle chain in question that's not inlining is pretty simple:

 * DMH pointing back at fib_ruby
 * permute args
 * GWT

 - Charlie

 On Mon, May 23, 2011 at 4:19 PM, Charles Oliver Nutter
 head...@headius.com wrote:
 I'm working up a set of files that show JRuby compilation output, but
 I noticed a couple things that might be interesting right now.

 First off, fairly early in the assembly output for fib, I see this:

  0x02876d1f: call      0x0282d0e0      ; OopMap{[96]=Oop [100]=Oop
 [28]=Oop [40]=Oop [48]=Oop off=644}
                                        ;*invokespecial invokeExact
                                        ; -
 java.lang.invoke.MethodHandle::invokeExact@63
                                        ; -
 java.lang.invoke.MethodHandle::invokeExact@23
                                        ; -
 bench.bench_fib_recursive::method__0$RUBY$fib_ruby@51 

Re: Good news, bad news

2011-05-23 Thread Charles Oliver Nutter
 On Mon, May 23, 2011 at 5:20 PM, Tom Rodriguez tom.rodrig...@oracle.com 
 wrote:
 Actually my guess is that the selectAlternative machinery is what's screwing 
 it up.  That code gets emitted as:

Same benchmark but instead of binding a direct handle via a GWT, I
force it to fail over to an inline cache immediately (still behind
invokedynamic):

~/projects/jruby ➔ jruby --server -X+C -Xinvokedynamic.maxfail=0
bench/language/bench_method_dispatch_only.rb
Test ruby method: 1000k loops calling self's foo 10 times
  0.515000   0.00   0.515000 (  0.403000)
  0.174000   0.00   0.174000 (  0.174000)
  0.136000   0.00   0.136000 (  0.136000)
  0.138000   0.00   0.138000 (  0.138000)
  0.164000   0.00   0.164000 (  0.164000)
  0.156000   0.00   0.156000 (  0.157000)
  0.15   0.00   0.15 (  0.15)
  0.145000   0.00   0.145000 (  0.145000)
  0.13   0.00   0.13 (  0.13)
  0.124000   0.00   0.124000 (  0.124000)

maxfail here indicates how many GWT it will attempt to chain for a
polymorphic call site before bailing out to a simple inline caching
target. That inline caching target is bound *directly* into the
invokedynamic call site, with no GWT in front of it. And as expected,
performance is basically the same as JRuby with inline caching only.
So it seems GWT may be the culprit.

Inlining graph for the above run, showing that the inline cache logic
(fail methods in InvokeDynamicSupport in JRuby) all inline into the
invokedynamic call site:

1%   bench.language.bench_method_dispatch_only::method__1$RUBY$invoking
@ 18 (177 bytes)
@ 23 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
@ 23 java.lang.invoke.MethodHandle::invokeExact (13 bytes)
  @ 6 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::fail (97 bytes)
@ 2 
org.jruby.runtime.invokedynamic.InvokeDynamicSupport::pollAndGetClass
(13 bytes)
  @ 1 org.jruby.runtime.ThreadContext::callThreadPoll (23 bytes)
@ 19 org.jruby.runtime.ThreadContext::pollThreadEvents
executed  MinInliningThreshold times
  @ 5 org.jruby.runtime.builtin.IRubyObject::getMetaClass (0 bytes)
   type profile org/jruby/runtime/builtin/IRubyObject -
org/jruby/RubyObject (100%)
  @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes)
@ 17 org.jruby.runtime.callsite.CacheEntry::typeOk (6 bytes)
  @ 2 org.jruby.runtime.callsite.CacheEntry::typeOk (17 bytes)
@ 5 org.jruby.RubyModule::getCacheToken (5 bytes)
@ 34 org.jruby.internal.runtime.methods.DynamicMethod::call (13 bytes)
 type profile org/jruby/internal/runtime/methods/DynamicMethod
- bench$language$bench_method_dispatch_only$method__0$RUBY$foo (100%)
@ 34 bench$language$bench_method_dispatch_only$method__0$RUBY$foo::call
(13 bytes)
  @ 9 bench$language$bench_method_dispatch_only$method__0$RUBY$foo::call
(21 bytes)
@ 17
bench.language.bench_method_dispatch_only::method__0$RUBY$foo (2
bytes)
@ 42 org.jruby.RubyModule::searchWithCache call site not reached
@ 50 
org.jruby.runtime.invokedynamic.InvokeDynamicSupport$JRubyCallSite::callType
(5 bytes)
@ 56 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::methodMissing
executed  MinInliningThreshold times
@ 65 
org.jruby.runtime.invokedynamic.InvokeDynamicSupport$JRubyCallSite::callType
(5 bytes)
@ 72 
org.jruby.runtime.invokedynamic.InvokeDynamicSupport::callMethodMissing
never executed
@ 93 org.jruby.internal.runtime.methods.DynamicMethod::call (13 bytes)

Note that foo is shown inlining here as well, but I think that's
only because this particular inline-caching call path has only seen
one target at this point.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-23 Thread Tom Rodriguez

On May 23, 2011, at 3:27 PM, Charles Oliver Nutter wrote:

 On Mon, May 23, 2011 at 5:20 PM, Tom Rodriguez tom.rodrig...@oracle.com 
 wrote:
 
 On May 23, 2011, at 2:50 PM, Charles Oliver Nutter wrote:
 I also do not see any indication *why* nothing inlines past this
 point. Usually it would say too big or something.
 
 If there's no message then it wasn't even a candidate for inlining, either 
 because the path was trimmed away by the optimizer or there wasn't enough 
 information to bind the callsite to a particular method.  It's probably the 
 latter in this case.
 
 Good to know I'm reading this output right :) Is there another logging
 flag that will log these situations too? I'd like to have that not
 just for invokedynamic opto.
 
 Actually my guess is that the selectAlternative machinery is what's screwing 
 it up.  That code gets emitted as:
 
 (test ? a : b).invokeExact();
 
 instead of
 
 if (test) a.invokeExact(); else b.invokeExact();
 
 Since the first one only uses a single call site we can't statically inline 
 even if a and b are constants.  The optimizer could do that optimization but 
 it currently doesn't.  Can you build the guard differently?  We'll need to 
 build a new call site optimization to deal with the selectAlternative logic.
 
 Is this a question for me? If so...
 
 Build the guard differently as in don't use
 MethodHandles.guardWithTest? I probably could, but it would require
 inserting normal Java code into the call path, which should in
 theory be megamorphic (since all invokedynamic everywhere will go
 through that same piece of code). Something like:
 
 IRubyObject myGuardWithTest(MethodHandle dmh, args...) {
  if (test(args) {
return dmh.invokeExact(args);
  } else {
inline cache
  }
 }
 
 I'd *love* for intermediate static Java snippits like this to inline
 straight through, even if invokeExact would be megamorphic under
 normal circumstances...but I don't think that's the case right now,
 right?

I haven't been following 292 that closely but it used to be a piece of code 
that did precisely that.

+private Object invoke_L0() throws Throwable {
+if ((boolean) test.invokeExact())
+return target.invokeExact();
+return fallback.invokeExact();
+}

It looks like it was reworked at some point to use selectAlternative but the 
optimizer was never updated to deal with it properly.  It's not particularly 
hard, we just need to generate code like we would for a bimorphic call site.  
The current code expects to either get a constant or something else and doesn't 
inline if it's something else.  In this case we have a Phi of two constants 
which we just need to split.

I'm still unclear why you couldn't write your own variant of guardWithTest and 
have it work but my knowledge of what's really allowed is somewhat limited.

tom

 
 - Charlie
 
 
 tom
 
 
 I do see MinusCallSite inline earlier.
 
 - Charlie
 
 On Mon, May 23, 2011 at 4:47 PM, Charles Oliver Nutter
 head...@headius.com wrote:
 The following chunk should be the invokedynamic call to fib, via a
 GWT, an arg permuter, and perhaps one convert:
 
@ 77 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
@ 77 java.lang.invoke.MethodHandle::invokeExact (44 bytes)
  @ 8 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
  @ 8 java.lang.invoke.MethodHandle::invokeExact (7 bytes)
@ 3 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::test
 (20 bytes)
  @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes)
  @ 8 org.jruby.RubyModule::getCacheToken (5 bytes)
  @ 23 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
  @ 23 java.lang.invoke.MethodHandle::invokeExact (67 bytes)
@ 1 java.lang.Boolean::valueOf (14 bytes)
@ 10 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
@ 10 java.lang.invoke.MethodHandle::invokeExact (24 bytes)
  @ 11 java.lang.Boolean::booleanValue (5 bytes)
  @ 20 java.lang.invoke.MethodHandleImpl::selectAlternative (10 
 bytes)
@ 63 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
  @ 37 sun.invoke.util.ValueConversions::identity (2 bytes)
 
 This seems to only be the test logic; the actual fib invocation
 doesn't appear to show up in the inlining graph at all. Am I right?
 
 I see two of these in the LogCompilation output and nothing else
 around them. I'd expect to see them do the invocation of fib_ruby
 somewhere in there. It's like the success branch of GWT is not even
 being considered for inlining.
 
 - Charlie
 
 On Mon, May 23, 2011 at 4:41 PM, Tom Rodriguez tom.rodrig...@oracle.com 
 wrote:
 If there were to be a recursive inline in there, where would it occur?  I 
 can't tell from the names where in that inline tree where the recursive 
 call occurs.
 
 tom
 
 On May 23, 2011, at 2:26 PM, Charles Oliver Nutter wrote:
 
 fib_ruby LogCompilation inlining graph, showing that fib_ruby is not
 inlined: 

Re: Good news, bad news

2011-05-23 Thread Rémi Forax
On 05/24/2011 12:44 AM, Tom Rodriguez wrote:
 On May 23, 2011, at 3:27 PM, Charles Oliver Nutter wrote:

 On Mon, May 23, 2011 at 5:20 PM, Tom Rodrigueztom.rodrig...@oracle.com  
 wrote:
 On May 23, 2011, at 2:50 PM, Charles Oliver Nutter wrote:
 I also do not see any indication *why* nothing inlines past this
 point. Usually it would say too big or something.
 If there's no message then it wasn't even a candidate for inlining, either 
 because the path was trimmed away by the optimizer or there wasn't enough 
 information to bind the callsite to a particular method.  It's probably the 
 latter in this case.
 Good to know I'm reading this output right :) Is there another logging
 flag that will log these situations too? I'd like to have that not
 just for invokedynamic opto.

 Actually my guess is that the selectAlternative machinery is what's 
 screwing it up.  That code gets emitted as:

 (test ? a : b).invokeExact();

 instead of

 if (test) a.invokeExact(); else b.invokeExact();

 Since the first one only uses a single call site we can't statically inline 
 even if a and b are constants.  The optimizer could do that optimization 
 but it currently doesn't.  Can you build the guard differently?  We'll need 
 to build a new call site optimization to deal with the selectAlternative 
 logic.
 Is this a question for me? If so...

 Build the guard differently as in don't use
 MethodHandles.guardWithTest? I probably could, but it would require
 inserting normal Java code into the call path, which should in
 theory be megamorphic (since all invokedynamic everywhere will go
 through that same piece of code). Something like:

 IRubyObject myGuardWithTest(MethodHandle dmh, args...) {
   if (test(args) {
 return dmh.invokeExact(args);
   } else {
 inline cache
   }
 }

 I'd *love* for intermediate static Java snippits like this to inline
 straight through, even if invokeExact would be megamorphic under
 normal circumstances...but I don't think that's the case right now,
 right?
 I haven't been following 292 that closely but it used to be a piece of code 
 that did precisely that.

 +private Object invoke_L0() throws Throwable {
 +if ((boolean) test.invokeExact())
 +return target.invokeExact();
 +return fallback.invokeExact();
 +}

 It looks like it was reworked at some point to use selectAlternative but the 
 optimizer was never updated to deal with it properly.  It's not particularly 
 hard, we just need to generate code like we would for a bimorphic call site.  
 The current code expects to either get a constant or something else and 
 doesn't inline if it's something else.  In this case we have a Phi of two 
 constants which we just need to split.

 I'm still unclear why you couldn't write your own variant of guardWithTest 
 and have it work but my knowledge of what's really allowed is somewhat 
 limited.

The main problem of writing your own variant is that you will have to 
write 256 * (number_of_primitive_type + 1) variants.
The main idea of GWT (and all other method handle combiners) is that 
they provide a way to get such polymorphic variant
without coding all combinations.

The benefit is also that these combiners are known by the VM that can 
provide
dedicated path that are more efficient,
by ex, the arguments of GWT is pushed only once and used by the test 
*and* the target (or fallback)
or better guarantee about how trees of combiners are inlined.

 tom

Rémi

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-23 Thread Tom Rodriguez

On May 23, 2011, at 5:13 PM, Charles Oliver Nutter wrote:

 On Mon, May 23, 2011 at 6:17 PM, Rémi Forax fo...@univ-mlv.fr wrote:
 On 05/24/2011 12:44 AM, Tom Rodriguez wrote:
 I haven't been following 292 that closely but it used to be a piece of code 
 that did precisely that.
 
 +private Object invoke_L0() throws Throwable {
 +if ((boolean) test.invokeExact())
 +return target.invokeExact();
 +return fallback.invokeExact();
 +}
 
 It looks like it was reworked at some point to use selectAlternative but 
 the optimizer was never updated to deal with it properly.  It's not 
 particularly hard, we just need to generate code like we would for a 
 bimorphic call site.  The current code expects to either get a constant or 
 something else and doesn't inline if it's something else.  In this case we 
 have a Phi of two constants which we just need to split.
 
 I'll give it a shot. I'll put together a benchmark that produces
 effectively monomorphic calls at the invokedynamic call site, but
 which uses an intermediate hand-written piece of Java code (that ends
 up polymorphic) in place of GWT.
 
 Obviously GWT should not fail this badly in final release, but if this
 works it could mean many, many other patterns will too. And it will
 definitively answer the question about whether such a case inlines in
 current logic.

You can also just wait for a fix.  I've started writing the logic for it and 
just need to duplicate the guard.  We should be able to have this fixed for 
JDK7.

tom

 
 The main problem of writing your own variant is that you will have to
 write 256 * (number_of_primitive_type + 1) variants.
 The main idea of GWT (and all other method handle combiners) is that
 they provide a way to get such polymorphic variant
 without coding all combinations.
 
 The benefit is also that these combiners are known by the VM that can
 provide
 dedicated path that are more efficient,
 by ex, the arguments of GWT is pushed only once and used by the test
 *and* the target (or fallback)
 or better guarantee about how trees of combiners are inlined.
 
 I am not currently specializing any call paths for primitives, but
 that will come soon. For experiment's sake, I only need to provide one
 such GWT replacement to test fib (no block passed, single IRubyObject
 argument) and one more for the dispatch bench (no block, no args).
 We'll see how it looks.
 
 - Charlie
 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-23 Thread John Rose
On May 23, 2011, at 3:44 PM, Tom Rodriguez wrote:

 I'd *love* for intermediate static Java snippits like this to inline
 straight through, even if invokeExact would be megamorphic under
 normal circumstances...but I don't think that's the case right now,
 right?
 
 I haven't been following 292 that closely but it used to be a piece of code 
 that did precisely that.
 
 +private Object invoke_L0() throws Throwable {
 +if ((boolean) test.invokeExact())
 +return target.invokeExact();
 +return fallback.invokeExact();
 +}
 
 It looks like it was reworked at some point to use selectAlternative but the 
 optimizer was never updated to deal with it properly.  It's not particularly 
 hard, we just need to generate code like we would for a bimorphic call site.  
 The current code expects to either get a constant or something else and 
 doesn't inline if it's something else.  In this case we have a Phi of two 
 constants which we just need to split.

Yes, it would be a quasi-bimorphic call site keyed from a phi of two constants, 
instead of a profile of two receiver classes.

 I'm still unclear why you couldn't write your own variant of guardWithTest 
 and have it work but my knowledge of what's really allowed is somewhat 
 limited.

Those snippets will inline (I think), but at some point Charlie will want to 
fetch a bit of constant stuff out of an instance variable.  That will look 
non-constant to the optimizer, even if the instance variable is final and the 
enclosing object is a constant reference.  We made sure this happens for 
java.lang.invoke classes, but we haven't extended it yet to user code, in part 
because it would have its own bug tail to work through.

-- John
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-23 Thread Charles Oliver Nutter
Well I have a prototype GWT in place, but I'm having trouble getting
invokeExact to work. As far as I can tell, incoming arguments should
match the handle type just fine, but I'm getting this error:

InvokeDynamicSupport.java:934:in `gwt':
java.lang.invoke.WrongMethodTypeException:
(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Ljava/lang/String;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject;
cannot be called with a different arity as
([Ljava/lang/Object;)Ljava/lang/Object;

The diff as it stands right now is here: https://gist.github.com/987939

- Charlie

On Mon, May 23, 2011 at 7:23 PM, Tom Rodriguez tom.rodrig...@oracle.com wrote:

 On May 23, 2011, at 5:13 PM, Charles Oliver Nutter wrote:

 On Mon, May 23, 2011 at 6:17 PM, Rémi Forax fo...@univ-mlv.fr wrote:
 On 05/24/2011 12:44 AM, Tom Rodriguez wrote:
 I haven't been following 292 that closely but it used to be a piece of 
 code that did precisely that.

 +        private Object invoke_L0() throws Throwable {
 +            if ((boolean) test.invokeExact())
 +                return target.invokeExact();
 +            return fallback.invokeExact();
 +        }

 It looks like it was reworked at some point to use selectAlternative but 
 the optimizer was never updated to deal with it properly.  It's not 
 particularly hard, we just need to generate code like we would for a 
 bimorphic call site.  The current code expects to either get a constant or 
 something else and doesn't inline if it's something else.  In this case we 
 have a Phi of two constants which we just need to split.

 I'll give it a shot. I'll put together a benchmark that produces
 effectively monomorphic calls at the invokedynamic call site, but
 which uses an intermediate hand-written piece of Java code (that ends
 up polymorphic) in place of GWT.

 Obviously GWT should not fail this badly in final release, but if this
 works it could mean many, many other patterns will too. And it will
 definitively answer the question about whether such a case inlines in
 current logic.

 You can also just wait for a fix.  I've started writing the logic for it and 
 just need to duplicate the guard.  We should be able to have this fixed for 
 JDK7.

 tom


 The main problem of writing your own variant is that you will have to
 write 256 * (number_of_primitive_type + 1) variants.
 The main idea of GWT (and all other method handle combiners) is that
 they provide a way to get such polymorphic variant
 without coding all combinations.

 The benefit is also that these combiners are known by the VM that can
 provide
 dedicated path that are more efficient,
 by ex, the arguments of GWT is pushed only once and used by the test
 *and* the target (or fallback)
 or better guarantee about how trees of combiners are inlined.

 I am not currently specializing any call paths for primitives, but
 that will come soon. For experiment's sake, I only need to provide one
 such GWT replacement to test fib (no block passed, single IRubyObject
 argument) and one more for the dispatch bench (no block, no args).
 We'll see how it looks.

 - Charlie
 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-23 Thread Ola Bini
Hi,

I'm happy to see that the performance degradation is getting addressed.
I would like to point out that there is still a serious crash in the
machinery too... Have you seen any reason why that happens?

Cheers


On 2011-05-24 06.11, John Rose wrote:
 On May 23, 2011, at 3:44 PM, Tom Rodriguez wrote:
 
 I'd *love* for intermediate static Java snippits like this to
 inline straight through, even if invokeExact would be megamorphic
 under normal circumstances...but I don't think that's the case
 right now, right?
 
 I haven't been following 292 that closely but it used to be a piece
 of code that did precisely that.
 
 +private Object invoke_L0() throws Throwable { +
 if ((boolean) test.invokeExact()) +return
 target.invokeExact(); +return fallback.invokeExact(); +
 }
 
 It looks like it was reworked at some point to use
 selectAlternative but the optimizer was never updated to deal with
 it properly.  It's not particularly hard, we just need to generate
 code like we would for a bimorphic call site.  The current code
 expects to either get a constant or something else and doesn't
 inline if it's something else.  In this case we have a Phi of two
 constants which we just need to split.
 
 Yes, it would be a quasi-bimorphic call site keyed from a phi of two
 constants, instead of a profile of two receiver classes.
 
 I'm still unclear why you couldn't write your own variant of
 guardWithTest and have it work but my knowledge of what's really
 allowed is somewhat limited.
 
 Those snippets will inline (I think), but at some point Charlie will
 want to fetch a bit of constant stuff out of an instance variable.
 That will look non-constant to the optimizer, even if the instance
 variable is final and the enclosing object is a constant reference.
 We made sure this happens for java.lang.invoke classes, but we
 haven't extended it yet to user code, in part because it would have
 its own bug tail to work through.
 
 -- John ___ mlvm-dev
 mailing list mlvm-dev@openjdk.java.net 
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
 


-- 
 Ola Bini (http://olabini.com)
  Ioke - JRuby - ThoughtWorks

 Yields falsehood when quined yields falsehood when quined.

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-20 Thread Ola Bini
On 2011-05-20 11.21, John Rose wrote:
 On May 19, 2011, at 10:06 PM, Charles Oliver Nutter wrote:
 
 I've confimed that the i386 build performs just as poorly. Have not
 investigate at inlining or assembly level yet...and it's late, so I'm
 going to get some sleep.
 
 It's likely that method handle inlining is failing.
 
 Try -XX:+UnlockDiag* -XX:-UseRicochetFrames for differential testing.
 
 If you can find a microbenchmark which shows differential slowdown, we can 
 debug it.

When trying the above all my tests fail with this:
  java.lang.RuntimeException: java.lang.InternalError: NYI
at
seph.lang.interpreter.MessageInterpreter.evaluateFully(MessageInterpreter.java:107)
at seph.lang.Runtime.evaluateStream(Runtime.java:115)
at seph.lang.Runtime.evaluateString(Runtime.java:127)
at
seph.lang.code.BasicSanityTest.recursive_odd_and_even_that_should_blow_the_stack(BasicSanityTest.java:215)
Caused by: java.lang.InternalError: NYI
at
java.lang.invoke.AdapterMethodHandle.makeReturnConversion(AdapterMethodHandle.java:250)
at
java.lang.invoke.AdapterMethodHandle.makePairwiseConvert(AdapterMethodHandle.java:206)
at
java.lang.invoke.MethodHandleImpl.convertArguments(MethodHandleImpl.java:749)
at
java.lang.invoke.MethodHandleImpl$GuardWithTest.make(MethodHandleImpl.java:926)
at
java.lang.invoke.MethodHandleImpl.makeGuardWithTest(MethodHandleImpl.java:1066)
at
java.lang.invoke.MethodHandles.guardWithTest(MethodHandles.java:2072)
at java.lang.invoke.SwitchPoint.guardWithTest(SwitchPoint.java:145)
at
seph.lang.compiler.Bootstrap.initialSetup_intrinsic_if(Bootstrap.java:1298)
at seph$gen$abstraction$11.activateWith(Unknown Source)
at seph$gen$abstraction$11.activateWith(Unknown Source)
at seph.lang.ast.NamedMessage.go(NamedMessage.java:162)
at seph.lang.ast.NamedMessage.sendTo(NamedMessage.java:143)
at
seph.lang.interpreter.MessageInterpreter.evaluateFully(MessageInterpreter.java:92)


=(

Cheers
-- 
 Ola Bini (http://olabini.com)
  Ioke - JRuby - ThoughtWorks

 Yields falsehood when quined yields falsehood when quined.

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-20 Thread Charles Oliver Nutter
On Fri, May 20, 2011 at 1:07 AM, Ola Bini ola.b...@gmail.com wrote:
 Try -XX:+UnlockDiag* -XX:-UseRicochetFrames for differential testing.

 If you can find a microbenchmark which shows differential slowdown, we can 
 debug it.

 When trying the above all my tests fail with this:
  java.lang.RuntimeException: java.lang.InternalError: NYI

Same here.

Reproducing it is pretty easy...

* Clone JRuby: git clone https://github.com/jruby/jruby.git
* Build JRuby: cd jruby ; ant
* Run benchmarks: bin/jruby bench/bench_fib_recursive.rb 10 35

There are many benchmarks under the bench dir, almost all of which
show this degradation. You can disable invokedynamic use in JRuby by
passing -Xcompile.invokedynamic=false, as seen here:


~/projects/jruby ➔ jruby -Xcompile.invokedynamic=false
bench/bench_fib_recursive.rb 5 35
WARNING: Both jruby-complete.jar and jruby.jar are present in the
'lib' directory. Will use jruby.jar
9227465
  1.698000   0.00   1.698000 (  1.645000)
9227465
  1.352000   0.00   1.352000 (  1.352000)
9227465
  1.364000   0.00   1.364000 (  1.364000)
9227465
  1.344000   0.00   1.344000 (  1.344000)
9227465
  1.336000   0.00   1.336000 (  1.336000)

~/projects/jruby ➔ jruby -Xcompile.invokedynamic=true
bench/bench_fib_recursive.rb 5 35
WARNING: Both jruby-complete.jar and jruby.jar are present in the
'lib' directory. Will use jruby.jar
9227465
  3.182000   0.00   3.182000 (  3.12)
9227465
  3.019000   0.00   3.019000 (  3.019000)
9227465
  3.195000   0.00   3.195000 (  3.195000)
9227465
  3.163000   0.00   3.163000 (  3.163000)
9227465
  3.656000   0.00   3.656000 (  3.656000)

I'm sure something's just broken...this benchmark was easily almost 2x
faster than the non-invokedynamic version just a few days ago, and
it's now 2-3x slower than non-indy logic.

I'm going to be online all day, standing by to help in any way I can.
I did do a quick LogCompilation last night and the main thing I
noticed was that while it seemed like handles are inlining, the
eventual DMH does not inline. That would explain some of the perf
degradation, I think...but hopefully not all of it.

Whatever you need me to run or whatever information...let me know.

ALSO...I got the same error as Ola when running JRuby with a
standalone jar file:

~/projects/jruby ➔ java -jar lib/jruby-complete.jar bench/bench_fib_recursive.rb
MethodHandle.java:-1:in `invokeExact': java.lang.NoClassDefFoundError:
org/jruby/runtime/ThreadContext
from MethodHandle.java:-1:in `invokeExact'
from bench/bench_fib_recursive.rb:7:in `method__0$RUBY$fib_ruby'
from bench/bench_fib_recursive.rb:7:in `method__0$RUBY$fib_ruby'
from bench/bench_fib_recursive.rb:7:in `method__0$RUBY$fib_ruby'
from bench/bench_fib_recursive.rb:7:in `method__0$RUBY$fib_ruby'

As in Ola's case, ThreadContext is a core JRuby class. I do *not* get
this error when JRuby is loaded from the bootstrap classloader, so I
imagine there's some mix-up with generated bytecode and the
classloaders that that bytecode gets loaded into.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Good news, bad news

2011-05-20 Thread Charles Oliver Nutter
Another update...I managed to turn off the ricochet frames and have it
still work (???). The with-ricochet mode is faster, but still 2-3x
slower than no indy at all (using JRuby's IC), and 4-5x slower than
invokedynamic was recently...

HOWEVER...an assembly dump I just did contains some very odd code that
might indicate a problem in my code. Hold the presses on digging into
it on your end until I can explain this. (this = apparently standing
up an instance variable table for Fixnum objects...*definitely* should
not be happening).

0x02868f2a: mov [eax+0x1C], ebx  ;*invokespecial init
; -
org.jruby.RubyBasicObject::init@1 (line 218)
; -
org.jruby.RubyObject::init@2 (line 116)
; -
org.jruby.RubyNumeric::init@2 (line 114)
; -
org.jruby.RubyInteger::init@2 (line 95)
; -
org.jruby.RubyFixnum::init@5 (line 112)
; -
org.jruby.RubyFixnum::newFixnum@25 (line 173)
; -
org.jruby.RubyFixnum::op_minus@34 (line 395)
; -
org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16)
  0x02868f2d: mov   ebx, [esp+0x14]
  0x02868f31: mov   ecx, [ebx+0xBC]
  0x02868f37: mov   edx, eax
  0x02868f39: shr   edx, 9
  0x02868f3c: mov   [edx+0x394000], 0x00  ;*putfield varTable
; -
org.jruby.RubyBasicObject::init@8 (line 84)
; -
org.jruby.RubyObject::init@2 (line 116)
; -
org.jruby.RubyNumeric::init@2 (line 114)
; -
org.jruby.RubyInteger::init@2 (line 95)
; -
org.jruby.RubyFixnum::init@5 (line 112)
; -
org.jruby.RubyFixnum::newFixnum@25 (line 173)
; -
org.jruby.RubyFixnum::op_minus@34 (line 395)
; -
org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16)
  0x02868f43: mov   edx, a 'java/lang/Class' = 'org/jruby/RubyBasicObject'
;   {oop(a 'java/lang/Class' =
'org/jruby/RubyBasicObject')}
  0x02868f48: mov   ebx, [edx+0x68]  ;*getstatic NULL_OBJECT_ARRAY
; -
org.jruby.RubyBasicObject::init@5 (line 84)
; -
org.jruby.RubyObject::init@2 (line 116)
; -
org.jruby.RubyNumeric::init@2 (line 114)
; -
org.jruby.RubyInteger::init@2 (line 95)
; -
org.jruby.RubyFixnum::init@5 (line 112)
; -
org.jruby.RubyFixnum::newFixnum@25 (line 173)
; -
org.jruby.RubyFixnum::op_minus@34 (line 395)
; -
org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16)
  0x02868f4b: mov   [eax+0x10], ebx
  0x02868f4e: lock add  [esp], 0x00 ;*putfield varTable
; -
org.jruby.RubyBasicObject::init@8 (line 84)
; -
org.jruby.RubyObject::init@2 (line 116)
; -
org.jruby.RubyNumeric::init@2 (line 114)
; -
org.jruby.RubyInteger::init@2 (line 95)
; -
org.jruby.RubyFixnum::init@5 (line 112)
; -
org.jruby.RubyFixnum::newFixnum@25 (line 173)
; -
org.jruby.RubyFixnum::op_minus@34 (line 395)
; -
org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16)

On Fri, May 20, 2011 at 8:10 AM, Charles Oliver Nutter
head...@headius.com wrote:
 On Fri, May 20, 2011 at 1:07 AM, Ola Bini ola.b...@gmail.com wrote:
 Try -XX:+UnlockDiag* -XX:-UseRicochetFrames for differential testing.

 If you can find a microbenchmark which shows differential slowdown, we can 
 debug it.

 When trying the above all my tests fail with this:
  java.lang.RuntimeException: java.lang.InternalError: NYI

 Same here.

 Reproducing it is pretty easy...

 * Clone JRuby: git clone https://github.com/jruby/jruby.git
 * Build JRuby: cd jruby ; ant
 * Run benchmarks: bin/jruby bench/bench_fib_recursive.rb 10 35

 There are many benchmarks under the bench dir, almost all of which
 show this degradation. You can disable invokedynamic use in JRuby by
 passing -Xcompile.invokedynamic=false, as seen here:


 ~/projects/jruby ➔ jruby -Xcompile.invokedynamic=false
 bench/bench_fib_recursive.rb 5 35
 WARNING: Both jruby-complete.jar and jruby.jar are present in the
 'lib' directory. Will use 

Re: Good news, bad news

2011-05-20 Thread Charles Oliver Nutter
Meh, false alarm. This is just the default empty Object[] assigned to
all objects. I still can't explain the perf degradation.

Here's the full assembly for fib_ruby: https://gist.github.com/983721

It seems like stuff is inlining, so I have no idea why performance is
so terrible.

- Charlie

On Fri, May 20, 2011 at 2:59 PM, Charles Oliver Nutter
head...@headius.com wrote:
 Another update...I managed to turn off the ricochet frames and have it
 still work (???). The with-ricochet mode is faster, but still 2-3x
 slower than no indy at all (using JRuby's IC), and 4-5x slower than
 invokedynamic was recently...

 HOWEVER...an assembly dump I just did contains some very odd code that
 might indicate a problem in my code. Hold the presses on digging into
 it on your end until I can explain this. (this = apparently standing
 up an instance variable table for Fixnum objects...*definitely* should
 not be happening).

 0x02868f2a: mov [eax+0x1C], ebx  ;*invokespecial init
                                        ; -
 org.jruby.RubyBasicObject::init@1 (line 218)
                                        ; -
 org.jruby.RubyObject::init@2 (line 116)
                                        ; -
 org.jruby.RubyNumeric::init@2 (line 114)
                                        ; -
 org.jruby.RubyInteger::init@2 (line 95)
                                        ; -
 org.jruby.RubyFixnum::init@5 (line 112)
                                        ; -
 org.jruby.RubyFixnum::newFixnum@25 (line 173)
                                        ; -
 org.jruby.RubyFixnum::op_minus@34 (line 395)
                                        ; -
 org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16)
  0x02868f2d: mov       ebx, [esp+0x14]
  0x02868f31: mov       ecx, [ebx+0xBC]
  0x02868f37: mov       edx, eax
  0x02868f39: shr       edx, 9
  0x02868f3c: mov       [edx+0x394000], 0x00  ;*putfield varTable
                                        ; -
 org.jruby.RubyBasicObject::init@8 (line 84)
                                        ; -
 org.jruby.RubyObject::init@2 (line 116)
                                        ; -
 org.jruby.RubyNumeric::init@2 (line 114)
                                        ; -
 org.jruby.RubyInteger::init@2 (line 95)
                                        ; -
 org.jruby.RubyFixnum::init@5 (line 112)
                                        ; -
 org.jruby.RubyFixnum::newFixnum@25 (line 173)
                                        ; -
 org.jruby.RubyFixnum::op_minus@34 (line 395)
                                        ; -
 org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16)
  0x02868f43: mov       edx, a 'java/lang/Class' = 'org/jruby/RubyBasicObject'
                                        ;   {oop(a 'java/lang/Class' =
 'org/jruby/RubyBasicObject')}
  0x02868f48: mov       ebx, [edx+0x68]  ;*getstatic NULL_OBJECT_ARRAY
                                        ; -
 org.jruby.RubyBasicObject::init@5 (line 84)
                                        ; -
 org.jruby.RubyObject::init@2 (line 116)
                                        ; -
 org.jruby.RubyNumeric::init@2 (line 114)
                                        ; -
 org.jruby.RubyInteger::init@2 (line 95)
                                        ; -
 org.jruby.RubyFixnum::init@5 (line 112)
                                        ; -
 org.jruby.RubyFixnum::newFixnum@25 (line 173)
                                        ; -
 org.jruby.RubyFixnum::op_minus@34 (line 395)
                                        ; -
 org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16)
  0x02868f4b: mov       [eax+0x10], ebx
  0x02868f4e: lock add  [esp], 0x00     ;*putfield varTable
                                        ; -
 org.jruby.RubyBasicObject::init@8 (line 84)
                                        ; -
 org.jruby.RubyObject::init@2 (line 116)
                                        ; -
 org.jruby.RubyNumeric::init@2 (line 114)
                                        ; -
 org.jruby.RubyInteger::init@2 (line 95)
                                        ; -
 org.jruby.RubyFixnum::init@5 (line 112)
                                        ; -
 org.jruby.RubyFixnum::newFixnum@25 (line 173)
                                        ; -
 org.jruby.RubyFixnum::op_minus@34 (line 395)
                                        ; -
 org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16)

 On Fri, May 20, 2011 at 8:10 AM, Charles Oliver Nutter
 head...@headius.com wrote:
 On Fri, May 20, 2011 at 1:07 AM, Ola Bini ola.b...@gmail.com wrote:
 Try -XX:+UnlockDiag* -XX:-UseRicochetFrames for differential testing.

 If you can find a microbenchmark which shows differential slowdown, we can 
 debug it.

 When trying the above all my tests fail with this:
  java.lang.RuntimeException: java.lang.InternalError: NYI

 Same here.

 Reproducing it is pretty easy...

 * Clone JRuby: git clone https://github.com/jruby/jruby.git
 * Build JRuby: cd jruby ; ant
 * Run benchmarks: 

Re: Good news, bad news

2011-05-20 Thread Tom Rodriguez
Is it the inlining policy?  I played a little with this and the current default 
policy is still not as good as the big tweaks to MaxInlineSize and 
InlineSmallCode.

tom

On May 20, 2011, at 1:21 PM, Charles Oliver Nutter wrote:

 Meh, false alarm. This is just the default empty Object[] assigned to
 all objects. I still can't explain the perf degradation.
 
 Here's the full assembly for fib_ruby: https://gist.github.com/983721
 
 It seems like stuff is inlining, so I have no idea why performance is
 so terrible.
 
 - Charlie
 
 On Fri, May 20, 2011 at 2:59 PM, Charles Oliver Nutter
 head...@headius.com wrote:
 Another update...I managed to turn off the ricochet frames and have it
 still work (???). The with-ricochet mode is faster, but still 2-3x
 slower than no indy at all (using JRuby's IC), and 4-5x slower than
 invokedynamic was recently...
 
 HOWEVER...an assembly dump I just did contains some very odd code that
 might indicate a problem in my code. Hold the presses on digging into
 it on your end until I can explain this. (this = apparently standing
 up an instance variable table for Fixnum objects...*definitely* should
 not be happening).
 
 0x02868f2a: mov [eax+0x1C], ebx  ;*invokespecial init
; -
 org.jruby.RubyBasicObject::init@1 (line 218)
; -
 org.jruby.RubyObject::init@2 (line 116)
; -
 org.jruby.RubyNumeric::init@2 (line 114)
; -
 org.jruby.RubyInteger::init@2 (line 95)
; -
 org.jruby.RubyFixnum::init@5 (line 112)
; -
 org.jruby.RubyFixnum::newFixnum@25 (line 173)
; -
 org.jruby.RubyFixnum::op_minus@34 (line 395)
; -
 org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16)
  0x02868f2d: mov   ebx, [esp+0x14]
  0x02868f31: mov   ecx, [ebx+0xBC]
  0x02868f37: mov   edx, eax
  0x02868f39: shr   edx, 9
  0x02868f3c: mov   [edx+0x394000], 0x00  ;*putfield varTable
; -
 org.jruby.RubyBasicObject::init@8 (line 84)
; -
 org.jruby.RubyObject::init@2 (line 116)
; -
 org.jruby.RubyNumeric::init@2 (line 114)
; -
 org.jruby.RubyInteger::init@2 (line 95)
; -
 org.jruby.RubyFixnum::init@5 (line 112)
; -
 org.jruby.RubyFixnum::newFixnum@25 (line 173)
; -
 org.jruby.RubyFixnum::op_minus@34 (line 395)
; -
 org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16)
  0x02868f43: mov   edx, a 'java/lang/Class' = 'org/jruby/RubyBasicObject'
;   {oop(a 'java/lang/Class' =
 'org/jruby/RubyBasicObject')}
  0x02868f48: mov   ebx, [edx+0x68]  ;*getstatic NULL_OBJECT_ARRAY
; -
 org.jruby.RubyBasicObject::init@5 (line 84)
; -
 org.jruby.RubyObject::init@2 (line 116)
; -
 org.jruby.RubyNumeric::init@2 (line 114)
; -
 org.jruby.RubyInteger::init@2 (line 95)
; -
 org.jruby.RubyFixnum::init@5 (line 112)
; -
 org.jruby.RubyFixnum::newFixnum@25 (line 173)
; -
 org.jruby.RubyFixnum::op_minus@34 (line 395)
; -
 org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16)
  0x02868f4b: mov   [eax+0x10], ebx
  0x02868f4e: lock add  [esp], 0x00 ;*putfield varTable
; -
 org.jruby.RubyBasicObject::init@8 (line 84)
; -
 org.jruby.RubyObject::init@2 (line 116)
; -
 org.jruby.RubyNumeric::init@2 (line 114)
; -
 org.jruby.RubyInteger::init@2 (line 95)
; -
 org.jruby.RubyFixnum::init@5 (line 112)
; -
 org.jruby.RubyFixnum::newFixnum@25 (line 173)
; -
 org.jruby.RubyFixnum::op_minus@34 (line 395)
; -
 org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16)
 
 On Fri, May 20, 2011 at 8:10 AM, Charles Oliver Nutter
 head...@headius.com wrote:
 On Fri, May 20, 2011 at 1:07 AM, Ola Bini ola.b...@gmail.com wrote:
 Try -XX:+UnlockDiag* -XX:-UseRicochetFrames for differential testing.
 
 If you can find a microbenchmark which shows differential slowdown, we 
 can debug it.
 
 When trying the above all my 

Re: Good news, bad news

2011-05-20 Thread Rémi Forax
On 05/20/2011 11:08 PM, Tom Rodriguez wrote:
 Is it the inlining policy?  I played a little with this and the current 
 default policy is still not as good as the big tweaks to MaxInlineSize and 
 InlineSmallCode.

 tom

I tend to agree,
things like isFixnumReopened and getCacheToken seem to be not inlined.

Charles, do you have tested if the inline flags are tweaked ?

Also I do not understand why you have JRuby's callsites like MinusCallSite
in the stacktrace. You should not use it if invokedynamic is enabled.

Rémi

 On May 20, 2011, at 1:21 PM, Charles Oliver Nutter wrote:

 Meh, false alarm. This is just the default empty Object[] assigned to
 all objects. I still can't explain the perf degradation.

 Here's the full assembly for fib_ruby: https://gist.github.com/983721

 It seems like stuff is inlining, so I have no idea why performance is
 so terrible.

 - Charlie

 On Fri, May 20, 2011 at 2:59 PM, Charles Oliver Nutter
 head...@headius.com  wrote:
 Another update...I managed to turn off the ricochet frames and have it
 still work (???). The with-ricochet mode is faster, but still 2-3x
 slower than no indy at all (using JRuby's IC), and 4-5x slower than
 invokedynamic was recently...

 HOWEVER...an assembly dump I just did contains some very odd code that
 might indicate a problem in my code. Hold the presses on digging into
 it on your end until I can explain this. (this = apparently standing
 up an instance variable table for Fixnum objects...*definitely* should
 not be happening).

 0x02868f2a: mov [eax+0x1C], ebx  ;*invokespecialinit
 ; -
 org.jruby.RubyBasicObject::init@1 (line 218)
 ; -
 org.jruby.RubyObject::init@2 (line 116)
 ; -
 org.jruby.RubyNumeric::init@2 (line 114)
 ; -
 org.jruby.RubyInteger::init@2 (line 95)
 ; -
 org.jruby.RubyFixnum::init@5 (line 112)
 ; -
 org.jruby.RubyFixnum::newFixnum@25 (line 173)
 ; -
 org.jruby.RubyFixnum::op_minus@34 (line 395)
 ; -
 org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16)
   0x02868f2d: mov   ebx, [esp+0x14]
   0x02868f31: mov   ecx, [ebx+0xBC]
   0x02868f37: mov   edx, eax
   0x02868f39: shr   edx, 9
   0x02868f3c: mov   [edx+0x394000], 0x00  ;*putfield varTable
 ; -
 org.jruby.RubyBasicObject::init@8 (line 84)
 ; -
 org.jruby.RubyObject::init@2 (line 116)
 ; -
 org.jruby.RubyNumeric::init@2 (line 114)
 ; -
 org.jruby.RubyInteger::init@2 (line 95)
 ; -
 org.jruby.RubyFixnum::init@5 (line 112)
 ; -
 org.jruby.RubyFixnum::newFixnum@25 (line 173)
 ; -
 org.jruby.RubyFixnum::op_minus@34 (line 395)
 ; -
 org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16)
   0x02868f43: mov   edx, a 'java/lang/Class' = 
 'org/jruby/RubyBasicObject'
 ;   {oop(a 'java/lang/Class' =
 'org/jruby/RubyBasicObject')}
   0x02868f48: mov   ebx, [edx+0x68]  ;*getstatic NULL_OBJECT_ARRAY
 ; -
 org.jruby.RubyBasicObject::init@5 (line 84)
 ; -
 org.jruby.RubyObject::init@2 (line 116)
 ; -
 org.jruby.RubyNumeric::init@2 (line 114)
 ; -
 org.jruby.RubyInteger::init@2 (line 95)
 ; -
 org.jruby.RubyFixnum::init@5 (line 112)
 ; -
 org.jruby.RubyFixnum::newFixnum@25 (line 173)
 ; -
 org.jruby.RubyFixnum::op_minus@34 (line 395)
 ; -
 org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16)
   0x02868f4b: mov   [eax+0x10], ebx
   0x02868f4e: lock add  [esp], 0x00 ;*putfield varTable
 ; -
 org.jruby.RubyBasicObject::init@8 (line 84)
 ; -
 org.jruby.RubyObject::init@2 (line 116)
 ; -
 org.jruby.RubyNumeric::init@2 (line 114)
 ; -
 org.jruby.RubyInteger::init@2 (line 95)
 ; -
 org.jruby.RubyFixnum::init@5 (line 112)
 ; -
 org.jruby.RubyFixnum::newFixnum@25 (line 173)
 ; -
 org.jruby.RubyFixnum::op_minus@34 (line 395)
 ; -
 

Re: Good news, bad news

2011-05-20 Thread Charles Oliver Nutter
I am still using the specialized math callsites because they inline well and an 
Indy-based approach was still quite a bit slower last time I tried.

 I am back to being worried about the inlining thresholds. With no MHs and 
runtime profiled direct calls, I was able to get this all to inline. So 
basically the fact that I am using Indy and MHs now penalizes me over not just 
profiled direct calls, but over a totally generic inline cache that usually 
doesn't inline at all. That's pretty bad :)

I can try fiddling with the thresholds later, but do we agree that an 
invokedynamic bound to a GWT plus some intermediate adapter handles ending in a 
direct handle should inline as well as e.g. an invokevirtual?

- Charlie (mobile)

On May 20, 2011, at 17:35, Rémi Forax fo...@univ-mlv.fr wrote:

 On 05/20/2011 11:08 PM, Tom Rodriguez wrote:
 Is it the inlining policy?  I played a little with this and the current 
 default policy is still not as good as the big tweaks to MaxInlineSize and 
 InlineSmallCode.
 
 tom
 
 I tend to agree,
 things like isFixnumReopened and getCacheToken seem to be not inlined.
 
 Charles, do you have tested if the inline flags are tweaked ?
 
 Also I do not understand why you have JRuby's callsites like MinusCallSite
 in the stacktrace. You should not use it if invokedynamic is enabled.
 
 Rémi
 
 On May 20, 2011, at 1:21 PM, Charles Oliver Nutter wrote:
 
 Meh, false alarm. This is just the default empty Object[] assigned to
 all objects. I still can't explain the perf degradation.
 
 Here's the full assembly for fib_ruby: https://gist.github.com/983721
 
 It seems like stuff is inlining, so I have no idea why performance is
 so terrible.
 
 - Charlie
 
 On Fri, May 20, 2011 at 2:59 PM, Charles Oliver Nutter
 head...@headius.com  wrote:
 Another update...I managed to turn off the ricochet frames and have it
 still work (???). The with-ricochet mode is faster, but still 2-3x
 slower than no indy at all (using JRuby's IC), and 4-5x slower than
 invokedynamic was recently...
 
 HOWEVER...an assembly dump I just did contains some very odd code that
 might indicate a problem in my code. Hold the presses on digging into
 it on your end until I can explain this. (this = apparently standing
 up an instance variable table for Fixnum objects...*definitely* should
 not be happening).
 
 0x02868f2a: mov [eax+0x1C], ebx  ;*invokespecialinit
; -
 org.jruby.RubyBasicObject::init@1 (line 218)
; -
 org.jruby.RubyObject::init@2 (line 116)
; -
 org.jruby.RubyNumeric::init@2 (line 114)
; -
 org.jruby.RubyInteger::init@2 (line 95)
; -
 org.jruby.RubyFixnum::init@5 (line 112)
; -
 org.jruby.RubyFixnum::newFixnum@25 (line 173)
; -
 org.jruby.RubyFixnum::op_minus@34 (line 395)
; -
 org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16)
  0x02868f2d: mov   ebx, [esp+0x14]
  0x02868f31: mov   ecx, [ebx+0xBC]
  0x02868f37: mov   edx, eax
  0x02868f39: shr   edx, 9
  0x02868f3c: mov   [edx+0x394000], 0x00  ;*putfield varTable
; -
 org.jruby.RubyBasicObject::init@8 (line 84)
; -
 org.jruby.RubyObject::init@2 (line 116)
; -
 org.jruby.RubyNumeric::init@2 (line 114)
; -
 org.jruby.RubyInteger::init@2 (line 95)
; -
 org.jruby.RubyFixnum::init@5 (line 112)
; -
 org.jruby.RubyFixnum::newFixnum@25 (line 173)
; -
 org.jruby.RubyFixnum::op_minus@34 (line 395)
; -
 org.jruby.runtime.callsite.MinusCallSite::call@24 (line 16)
  0x02868f43: mov   edx, a 'java/lang/Class' = 
 'org/jruby/RubyBasicObject'
;   {oop(a 'java/lang/Class' =
 'org/jruby/RubyBasicObject')}
  0x02868f48: mov   ebx, [edx+0x68]  ;*getstatic NULL_OBJECT_ARRAY
; -
 org.jruby.RubyBasicObject::init@5 (line 84)
; -
 org.jruby.RubyObject::init@2 (line 116)
; -
 org.jruby.RubyNumeric::init@2 (line 114)
; -
 org.jruby.RubyInteger::init@2 (line 95)
; -
 org.jruby.RubyFixnum::init@5 (line 112)
; -
 org.jruby.RubyFixnum::newFixnum@25 (line 173)
; -
 org.jruby.RubyFixnum::op_minus@34 (line 395)
; -
 org.jruby.runtime.callsite.MinusCallSite::call@24 

Re: Good news, bad news

2011-05-19 Thread John Rose
On May 19, 2011, at 10:06 PM, Charles Oliver Nutter wrote:

 I've confimed that the i386 build performs just as poorly. Have not
 investigate at inlining or assembly level yet...and it's late, so I'm
 going to get some sleep.

It's likely that method handle inlining is failing.

Try -XX:+UnlockDiag* -XX:-UseRicochetFrames for differential testing.

If you can find a microbenchmark which shows differential slowdown, we can 
debug it.

-- John
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev