Re: Latest experiments...happiness and sadness

2012-10-17 Thread Ben Evans
Hi Charlie,

Can you send us a decent link or two once it actually does drop. I'm
not much of a Ruby head generally, but would like to see the numbers
(and, of course, take a quick look at their testing / benching
methodology).

Thanks,

Ben

On Wed, Oct 17, 2012 at 1:53 AM, Charles Oliver Nutter
head...@headius.com wrote:
 Hello all!

 I've recently been informed that a new Ruby implementation is about to
 be announced that puts JRuby's numeric perf to shame. Boo hoo.

 It's not like I expected us to retain the numeric crown since we're
 still allocating objects for every number in the system, but hopefully
 we can get that crown back at some point.

 In an effort to start getting back to indy + perf work (with JRuby 1.7
 almost released, finally), I bring you today's benchmark:

 50.times { puts Benchmark.measure { f = 20.5; i = 0; while i 
 200; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f
 += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f
 -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1;i
 += 1; end } }

 So we have a 2M fixnum loop with ten float adds and ten float
 subtracts. Other variations of this have more iterations and fewer
 float operations or put the whole loop inside a times{} block. This
 version runs in about 0.34s on hotspot-comp + Christian's patches,
 which beats Java 7 at 0.39s. If I remove some rarely-followed boolean
 logic in the creation of all Ruby objects (including floats) I can get
 this down to 0.29s. This is many times faster than almost all the
 current Ruby implementations.

 However, this new Ruby impl runs the same code in around 0.1s, so even
 with everything inlining JRuby + indy + hotspot-comp + patches is
 still 3x slower. I suspect Float allocation is the main bottleneck
 here.

 Here's logc output for one of the adds:

 @ 251 java.lang.invoke.LambdaForm$MH::linkToCallSite (18 bytes)
   @ 1 java.lang.invoke.Invokers::getCallSiteTarget (8 bytes)
 @ 4 java.lang.invoke.MutableCallSite::getTarget (5 bytes)
   @ 14 java.lang.invoke.MethodHandle::invokeBasic (0 bytes)
   @ 14 java.lang.invoke.LambdaForm$BMH::reinvoke (32 bytes)
 @ 13 java.lang.invoke.BoundMethodHandle$Species_LD::reinvokerTarget
 (8 bytes)
 @ 28 java.lang.invoke.MethodHandle::invokeBasic (0 bytes)
 @ 28 java.lang.invoke.LambdaForm$DMH::invokeStatic_LLLD_L (20 bytes)
   @ 1 java.lang.invoke.DirectMethodHandle::internalMemberName (8 
 bytes)
   @ 16 java.lang.invoke.MethodHandle::linkToStatic (0 bytes)
   @ 16 org.jruby.runtime.invokedynamic.MathLinker::float_op_plus
 (10 bytes)
 @ 6 org.jruby.RubyFloat::op_plus (14 bytes)
   @ 1 org.jruby.RubyBasicObject::getRuntime (8 bytes)
 @ 1 org.jruby.RubyBasicObject::getMetaClass (5 bytes)
 @ 4 org.jruby.RubyClass::getClassRuntime (5 bytes)
   @ 10 org.jruby.RubyFloat::newFloat (10 bytes)
 @ 6 org.jruby.RubyFloat::init (15 bytes)
   @ 3 org.jruby.Ruby::getFloat (5 bytes)
   @ 6 org.jruby.RubyNumeric::init (7 bytes)
 @ 3 org.jruby.RubyObject::init (7 bytes)
   @ 3 org.jruby.RubyBasicObject::init (30 bytes)
 @ 1 java.lang.Object::init (1 bytes)

 This is *great*. We're getting all paths inlined, and allocation
 inlines all the way up to Object::init, so in theory escape analysis
 could get rid of this...RIGHT? WRONG!!!

 logc appears to be missing some ouput (either the tool or the
 LogCompilation flag are dropping information). The same block of code
 from PrintInlining:

 @ 207
 java.lang.invoke.LambdaForm$MH/1942422426::linkToCallSite (18 bytes)
 inline (hot)
   @ 1
 java.lang.invoke.Invokers::getCallSiteTarget (8 bytes)   inline (hot)
 @ 4
 java.lang.invoke.MutableCallSite::getTarget (5 bytes)   inline (hot)
   @ 14
 java.lang.invoke.LambdaForm$MH/1896635336::guard (80 bytes)   inline
 (hot)
 @ 12   java.lang.Class::cast (27
 bytes)   inline (hot)
   @ 6   java.lang.Class::isInstance (0
 bytes)   (intrinsic)
 @ 17
 java.lang.invoke.LambdaForm$BMH/1650319731::reinvoke (30 bytes)
 inline (hot)
   @ 13
 java.lang.invoke.BoundMethodHandle$Species_LL::reinvokerTarget (8
 bytes)   inline (hot)
   @ 26
 java.lang.invoke.LambdaForm$DMH/842171382::invokeStatic_LL_I (15
 bytes)   inline (hot)
 @ 1
 java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes)
 inline (hot)
 @ 11
 org.jruby.runtime.invokedynamic.MathLinker::floatTest (20 bytes)
 inline (hot)
   @ 8
 

Re: Latest experiments...happiness and sadness

2012-10-17 Thread Remi Forax
On 10/17/2012 02:53 AM, Charles Oliver Nutter wrote:
 Hello all!

 I've recently been informed that a new Ruby implementation is about to
 be announced that puts JRuby's numeric perf to shame. Boo hoo.

 It's not like I expected us to retain the numeric crown since we're
 still allocating objects for every number in the system, but hopefully
 we can get that crown back at some point.

 In an effort to start getting back to indy + perf work (with JRuby 1.7
 almost released, finally), I bring you today's benchmark:

 50.times { puts Benchmark.measure { f = 20.5; i = 0; while i 
 200; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f
 += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f
 -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1;i
 += 1; end } }

 So we have a 2M fixnum loop with ten float adds and ten float
 subtracts. Other variations of this have more iterations and fewer
 float operations or put the whole loop inside a times{} block. This
 version runs in about 0.34s on hotspot-comp + Christian's patches,
 which beats Java 7 at 0.39s. If I remove some rarely-followed boolean
 logic in the creation of all Ruby objects (including floats) I can get
 this down to 0.29s. This is many times faster than almost all the
 current Ruby implementations.

 However, this new Ruby impl runs the same code in around 0.1s, so even
 with everything inlining JRuby + indy + hotspot-comp + patches is
 still 3x slower. I suspect Float allocation is the main bottleneck
 here.

 Here's logc output for one of the adds:

  @ 251 java.lang.invoke.LambdaForm$MH::linkToCallSite (18 bytes)
@ 1 java.lang.invoke.Invokers::getCallSiteTarget (8 bytes)
  @ 4 java.lang.invoke.MutableCallSite::getTarget (5 bytes)
@ 14 java.lang.invoke.MethodHandle::invokeBasic (0 bytes)
@ 14 java.lang.invoke.LambdaForm$BMH::reinvoke (32 bytes)
  @ 13 java.lang.invoke.BoundMethodHandle$Species_LD::reinvokerTarget
 (8 bytes)
  @ 28 java.lang.invoke.MethodHandle::invokeBasic (0 bytes)
  @ 28 java.lang.invoke.LambdaForm$DMH::invokeStatic_LLLD_L (20 bytes)
@ 1 java.lang.invoke.DirectMethodHandle::internalMemberName (8 
 bytes)
@ 16 java.lang.invoke.MethodHandle::linkToStatic (0 bytes)
@ 16 org.jruby.runtime.invokedynamic.MathLinker::float_op_plus
 (10 bytes)
  @ 6 org.jruby.RubyFloat::op_plus (14 bytes)
@ 1 org.jruby.RubyBasicObject::getRuntime (8 bytes)
  @ 1 org.jruby.RubyBasicObject::getMetaClass (5 bytes)
  @ 4 org.jruby.RubyClass::getClassRuntime (5 bytes)
@ 10 org.jruby.RubyFloat::newFloat (10 bytes)
  @ 6 org.jruby.RubyFloat::init (15 bytes)
@ 3 org.jruby.Ruby::getFloat (5 bytes)
@ 6 org.jruby.RubyNumeric::init (7 bytes)
  @ 3 org.jruby.RubyObject::init (7 bytes)
@ 3 org.jruby.RubyBasicObject::init (30 bytes)
  @ 1 java.lang.Object::init (1 bytes)

 This is *great*. We're getting all paths inlined, and allocation
 inlines all the way up to Object::init, so in theory escape analysis
 could get rid of this...RIGHT? WRONG!!!

 logc appears to be missing some ouput (either the tool or the
 LogCompilation flag are dropping information). The same block of code
 from PrintInlining:

  @ 207
 java.lang.invoke.LambdaForm$MH/1942422426::linkToCallSite (18 bytes)
 inline (hot)
@ 1
 java.lang.invoke.Invokers::getCallSiteTarget (8 bytes)   inline (hot)
  @ 4
 java.lang.invoke.MutableCallSite::getTarget (5 bytes)   inline (hot)
@ 14
 java.lang.invoke.LambdaForm$MH/1896635336::guard (80 bytes)   inline
 (hot)
  @ 12   java.lang.Class::cast (27
 bytes)   inline (hot)
@ 6   java.lang.Class::isInstance (0
 bytes)   (intrinsic)
  @ 17
 java.lang.invoke.LambdaForm$BMH/1650319731::reinvoke (30 bytes)
 inline (hot)
@ 13
 java.lang.invoke.BoundMethodHandle$Species_LL::reinvokerTarget (8
 bytes)   inline (hot)
@ 26
 java.lang.invoke.LambdaForm$DMH/842171382::invokeStatic_LL_I (15
 bytes)   inline (hot)
  @ 1
 java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes)
 inline (hot)
  @ 11
 org.jruby.runtime.invokedynamic.MathLinker::floatTest (20 bytes)
 inline (hot)
@ 8
 org.jruby.Ruby::isFloatReopened (5 bytes)   inline (hot)
  @ 50
 java.lang.invoke.LambdaForm$DMH/952682386::invokeSpecial__L (20
 bytes)   inline (hot)
@ 1
 

Re: Latest experiments...happiness and sadness

2012-10-17 Thread Charles Oliver Nutter
I will indeed! Just preparing ahead of time for the hype machine to go
into overdrive. Regardless of initial speed, there's an incredibly
long tail to any Ruby implementation, and new ones won't be useful
until months or years after they're first released.

- Charlie

On Wed, Oct 17, 2012 at 3:03 AM, Ben Evans
benjamin.john.ev...@gmail.com wrote:
 Hi Charlie,

 Can you send us a decent link or two once it actually does drop. I'm
 not much of a Ruby head generally, but would like to see the numbers
 (and, of course, take a quick look at their testing / benching
 methodology).

 Thanks,

 Ben

 On Wed, Oct 17, 2012 at 1:53 AM, Charles Oliver Nutter
 head...@headius.com wrote:
 Hello all!

 I've recently been informed that a new Ruby implementation is about to
 be announced that puts JRuby's numeric perf to shame. Boo hoo.

 It's not like I expected us to retain the numeric crown since we're
 still allocating objects for every number in the system, but hopefully
 we can get that crown back at some point.

 In an effort to start getting back to indy + perf work (with JRuby 1.7
 almost released, finally), I bring you today's benchmark:

 50.times { puts Benchmark.measure { f = 20.5; i = 0; while i 
 200; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f
 += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f
 -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1;i
 += 1; end } }

 So we have a 2M fixnum loop with ten float adds and ten float
 subtracts. Other variations of this have more iterations and fewer
 float operations or put the whole loop inside a times{} block. This
 version runs in about 0.34s on hotspot-comp + Christian's patches,
 which beats Java 7 at 0.39s. If I remove some rarely-followed boolean
 logic in the creation of all Ruby objects (including floats) I can get
 this down to 0.29s. This is many times faster than almost all the
 current Ruby implementations.

 However, this new Ruby impl runs the same code in around 0.1s, so even
 with everything inlining JRuby + indy + hotspot-comp + patches is
 still 3x slower. I suspect Float allocation is the main bottleneck
 here.

 Here's logc output for one of the adds:

 @ 251 java.lang.invoke.LambdaForm$MH::linkToCallSite (18 bytes)
   @ 1 java.lang.invoke.Invokers::getCallSiteTarget (8 bytes)
 @ 4 java.lang.invoke.MutableCallSite::getTarget (5 bytes)
   @ 14 java.lang.invoke.MethodHandle::invokeBasic (0 bytes)
   @ 14 java.lang.invoke.LambdaForm$BMH::reinvoke (32 bytes)
 @ 13 java.lang.invoke.BoundMethodHandle$Species_LD::reinvokerTarget
 (8 bytes)
 @ 28 java.lang.invoke.MethodHandle::invokeBasic (0 bytes)
 @ 28 java.lang.invoke.LambdaForm$DMH::invokeStatic_LLLD_L (20 bytes)
   @ 1 java.lang.invoke.DirectMethodHandle::internalMemberName (8 
 bytes)
   @ 16 java.lang.invoke.MethodHandle::linkToStatic (0 bytes)
   @ 16 org.jruby.runtime.invokedynamic.MathLinker::float_op_plus
 (10 bytes)
 @ 6 org.jruby.RubyFloat::op_plus (14 bytes)
   @ 1 org.jruby.RubyBasicObject::getRuntime (8 bytes)
 @ 1 org.jruby.RubyBasicObject::getMetaClass (5 bytes)
 @ 4 org.jruby.RubyClass::getClassRuntime (5 bytes)
   @ 10 org.jruby.RubyFloat::newFloat (10 bytes)
 @ 6 org.jruby.RubyFloat::init (15 bytes)
   @ 3 org.jruby.Ruby::getFloat (5 bytes)
   @ 6 org.jruby.RubyNumeric::init (7 bytes)
 @ 3 org.jruby.RubyObject::init (7 bytes)
   @ 3 org.jruby.RubyBasicObject::init (30 bytes)
 @ 1 java.lang.Object::init (1 bytes)

 This is *great*. We're getting all paths inlined, and allocation
 inlines all the way up to Object::init, so in theory escape analysis
 could get rid of this...RIGHT? WRONG!!!

 logc appears to be missing some ouput (either the tool or the
 LogCompilation flag are dropping information). The same block of code
 from PrintInlining:

 @ 207
 java.lang.invoke.LambdaForm$MH/1942422426::linkToCallSite (18 bytes)
 inline (hot)
   @ 1
 java.lang.invoke.Invokers::getCallSiteTarget (8 bytes)   inline (hot)
 @ 4
 java.lang.invoke.MutableCallSite::getTarget (5 bytes)   inline (hot)
   @ 14
 java.lang.invoke.LambdaForm$MH/1896635336::guard (80 bytes)   inline
 (hot)
 @ 12   java.lang.Class::cast (27
 bytes)   inline (hot)
   @ 6   java.lang.Class::isInstance (0
 bytes)   (intrinsic)
 @ 17
 java.lang.invoke.LambdaForm$BMH/1650319731::reinvoke (30 bytes)
 inline (hot)
   @ 13
 java.lang.invoke.BoundMethodHandle$Species_LL::reinvokerTarget (8
 bytes)   inline (hot)
   @ 26
 

Re: Latest experiments...happiness and sadness

2012-10-17 Thread Ben Evans
On Wed, Oct 17, 2012 at 2:54 PM, Charles Oliver Nutter
head...@headius.com wrote:
 I will indeed! Just preparing ahead of time for the hype machine to go
 into overdrive. Regardless of initial speed, there's an incredibly
 long tail to any Ruby implementation, and new ones won't be useful
 until months or years after they're first released.

It was ever thus. People seem to have this amazing cognitive bias for
the behaviour of a paper tiger over the real thing.

I've sometimes wondered if it's not a side-effect of the golden-path
thinking that many (most?) developers seem to get taught.

Ben, quite unable to come up with a decent joke about tigers and long
tails at short notice.
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: OS X OpenJDK 8 hotspot-comp + perf patches

2012-10-17 Thread Charles Oliver Nutter
This is a product build. I can run a fastdebug build if you need it
(and really I need it too, since PrintAssembly is still broken with
OpenJDK8).

- Charlie

On Wed, Oct 17, 2012 at 11:54 AM, Mark Roos mr...@roos.com wrote:
 Thanks for the build Charles.

 For my rtalk benchmarks
 jdk7  10.1 secs
 jdk812.8
 your build10.7

 Looking good.   Also no evidence of the class not found error.

 Is this a fast debug build?

 mark
 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: hg: mlvm/mlvm/hotspot: value-obj: first cut

2012-10-17 Thread Remi Forax
On 10/17/2012 05:23 PM, David Chase wrote:
 On 2012-10-16, at 5:14 AM, Remi Forax fo...@univ-mlv.fr wrote:

 Frozen/locked is a runtime property, not a type property so it's harder
 that that.
 You have to do a frozen check at the beginning of the method and pray
 that people
 will only use it with frozen object and not a not frozen one because in
 that case, you have to de-optimize.
 Maybe, you can have two versions on the same method, one with the frozen
 semantics and
 one with the boxed one (this is what I have done in JDart).
 I'm still coming up to speed on this, but I thought that the entire point of 
 having value objects
 is so that we would have a non-standard interface for all methods dealing 
 with value objects.
 Complex, boxed, is received as a single pointer to an object with headers 
 and fields.
 Complex, unboxed, is received as a pair of double.  The frozen check is 
 punted to the caller,
 who in turn may have punted it to his caller, etc, potentially removing the 
 need for all tests.

 Or did I read this wrong?

 The only place I see a need for a frozen check is when we are interoperating 
 with legacy code
 that is not playing the frozen-object game, and that we want to run with 
 complete legacy compatibility.
 In the case, the slow-and-boxed path also includes a frozen check -- if 
 frozen, unbox the object,
 and head for the fast path, otherwise, stay slow.

 From the notes (value-obj.txt) I see:

 38 - the reference returned from the (unsafe) marking primitive must be used 
 for all future accesses
 39 - any previous references (including the one passed to the marking 
 primitive) must be unused
 40  - in practice, this means you must mark an object locked immediately 
 after constructing it

 So, allocation of a value-object becomes something along the lines of

new java/lang/Integer
dup
iload ...
invokespecial java/lang/Integer.init(I)V
markingPrimitive

 But we can't rely on this, hence it is not a true type property.  But we 
 could make it be as-if.
 I think I have to assume some sort of a marker class (implements 
 PermanentlyLockable).

A bit in the class header (equivalent to implementing 
PermanentLyLockable) means
you have now two classses, the one with the old semantics and the one 
with the new semantics.
If you can have them both at runtime, you make your inlining cache less 
efficient,
it's a problem I've had with PHP.reboot.
Marking the instance seems a better idea.

 Then in bytecode version N+1, the verifier enforces this for all types 
 implementing PL, and
 all methods trucking in PL-implementing objects will by default generated 
 unboxed entrypoints.

 Except when dealing with legacy code, it's as good as a type.

100% of the produced code until now is what you call 'legacy' :)


 For legacy code, I think we have options.  Simplest is just to box at the 
 boundaries, with lazy
 compilation of boxed versions of PL-handling methods in modern bytecodes.  
 I'm trying to decide
 if we can do better with flow analysis; I think it has to be non-publishing 
 in the PL types, in addition
 to the other properties.

You have to box and unbox a boundaries and because Java allows 
overriding, an interface can have
two methods one which is implemented with boxing semantics and an other 
which use the frozen semantics.
So you need stub codes in front of method similar to verified/unverified 
entry points.


 David

Rémi

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Latest experiments...happiness and sadness

2012-10-17 Thread Christian Thalinger

On Oct 17, 2012, at 8:33 AM, David Chase david.r.ch...@oracle.com wrote:

 
 On 2012-10-16, at 8:53 PM, Charles Oliver Nutter head...@headius.com wrote:
 
 So *almost* everything is inlining, but one path (I believe it's the
 failure path from GWT after talking with Christian) is not reached.
 Because Hotspot's EA can't do partial EA, any unfollowed paths that
 would receive the allocated object have to be considered escapes, and
 so anywhere we're doing guarded logic (either in indy or in Java code,
 like Fixnum overflow checks) the unfollowed paths prevent EA from
 happening. Boo-hoo.
 
 Thoughts?
 
 I'm very new to this (have not even looked at the source code to Hotspot 
 yet), but is it possible
 to push the allocation/boxing to paths that are believed to be rarely taken?

That's what partial EA does.  I'm trying to get Vladimir to work on it and it 
seems I'm successful.

-- Chris

   This is not unlike
 region-based register allocation, where register allocation is limited to 
 what are believe to be
 the hot regions, and worry about region exits later -- if necessary, you can 
 always spill there. 

 
 David
 
 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Latest experiments...happiness and sadness

2012-10-17 Thread Charles Oliver Nutter
On Wed, Oct 17, 2012 at 2:07 PM, Christian Thalinger
christian.thalin...@oracle.com wrote:
 On Oct 17, 2012, at 8:33 AM, David Chase david.r.ch...@oracle.com wrote:

 I'm very new to this (have not even looked at the source code to Hotspot 
 yet), but is it possible
 to push the allocation/boxing to paths that are believed to be rarely taken?

 That's what partial EA does.  I'm trying to get Vladimir to work on it and it 
 seems I'm successful.

I started reading a bit about partial EA last night, specifically
looking at how PyPy does it.

In PyPy, the JIT treats accesses and calls against an object as acting
against a virtual object. I did not see if they actually allocate
stack space for this, but my guess is that it's virtual in that the
data moves are still unoptimized, unemitted operations in the IR
representation. If at some point the code calls a branch that would
need to see the actual object, they reconstitute it based on the
actual values at that point.

The concerns some have brought up about construction seem like
non-issues here; if the constructor chain is simple and just does
field updates (and doesn't allow the object to escape) then the
inlined version of the constructor can be treated as acting against
the virtual object (again, perhaps against stack-allocated object, or
just represented as object accesses in IR), so it still runs when
it's supposed to. The object reconstitution that happens later just
copies the current virtual object contents into new memory, and
proceeds from there.

I know very little about the current EA implementation in Hotspot. Was
it designed to be able to eventually support partial EA?

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: hg: mlvm/mlvm/hotspot: value-obj: first cut

2012-10-17 Thread David Chase

On 2012-10-17, at 2:12 PM, Remi Forax fo...@univ-mlv.fr wrote:
 But we can't rely on this, hence it is not a true type property.  But we 
 could make it be as-if.
 I think I have to assume some sort of a marker class (implements 
 PermanentlyLockable).
 
 A bit in the class header (equivalent to implementing 
 PermanentLyLockable) means
 you have now two classses, the one with the old semantics and the one 
 with the new semantics.
 If you can have them both at runtime, you make your inlining cache less 
 efficient,
 it's a problem I've had with PHP.reboot.
 Marking the instance seems a better idea.

I'm not sure I follow this -- if j/l/Integer implements PermanentlyLockable, 
that's just one class.
You end up with possibly two versions of each entrypoint that handle any 
Plockable, true, but this seems like a necessary consequence of supporting both 
legacy (boxed-only) and modern (unboxed) implementations of Plockable types.  
The entrypoints are different interfaces at the machine level; I don't see how 
you can avoid having two.  But many of the entrypoints might be mere 
stubs/wrappers.

I've been trying to figure out (Bharadwaj Yadavilli stopped by, we talked about 
this) whether the per-instance Plockable bit needs to exist or not.

Here are some assumptions I'm working from.  If any of these are wrong, that 
would be useful to know:

- we want value types in the future.

- we want value types passed and returned in unboxed form

- we want value types stored in arrays in unboxed form

- we can upcast an array of value-elements to an array of reference-elements

- we will sometimes box value types -- Object o = someInteger

- we must support legacy code

- we can use different compilation strategies for code depending on its 
bytecode version number.


So, a strawman implementation might be the following:

Use of values that implement Plockable in modern bytecodes is guaranteed to 
conform to the various value-friendly restrictions.
There's no extra bit, no extra call at allocation.
They compile as value types, an occurrence of new-dup-loadargs-init is replaced 
with running the constructor on the args on local memory.
The only exception is when they are upcast to a reference supertype.

In legacy bytecodes, none of this happens, it's just like today.  Mentions of 
Plockable types compiled as if they were boxed.

Compilation of any method that mentions a Plockable type in its signature 
depends on legacy/modern.
In modern, the default implementation is for unboxed, but a boxed stub is 
provided (perhaps lazily) for references from legacy code.
In legacy, the default implementation is for boxed, but an unboxed stub is 
provided (perhaps lazily) for references from modern code.

Arrays are nasty.
In both modern and legacy code, arrays themselves are reference types, but 
arrays of Plockable elements store the elements as value types.
In both modern and legacy code, loads from arrays of a reference type (in 
legacy code, Plockable is a reference type) with a Plockable subtype call a 
static factory method of the Plockable type that can create a boxed object 
given an array address and an index.  This can require an element-type check 
before loads.
Stores work in reverse, same assignment of responsibility to a method of the 
Plockable type.

Similarly, field loads/stores across the legacy/modern boundary box/unbox as 
necessary to obtain expected behavior.

Optimizations:
In legacy code, use-def webs of Plockable that are free of identity-uses can be 
unboxed.  Inlining of unboxing stubs from modern code might help here.
In modern code, identifying use-def webs that connect calls to legacy methods 
can be boxed, since the value representation will give no savings here.

I assume I am missing something, because I think this is simpler than John's 
proposal.  Am I skipping ahead straight to value types too quickly?

David

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Latest experiments...happiness and sadness

2012-10-17 Thread Remi Forax
On 10/17/2012 09:07 PM, Christian Thalinger wrote:
 On Oct 17, 2012, at 8:33 AM, David Chase david.r.ch...@oracle.com wrote:

 On 2012-10-16, at 8:53 PM, Charles Oliver Nutter head...@headius.com wrote:
 So *almost* everything is inlining, but one path (I believe it's the
 failure path from GWT after talking with Christian) is not reached.
 Because Hotspot's EA can't do partial EA, any unfollowed paths that
 would receive the allocated object have to be considered escapes, and
 so anywhere we're doing guarded logic (either in indy or in Java code,
 like Fixnum overflow checks) the unfollowed paths prevent EA from
 happening. Boo-hoo.
 Thoughts?
 I'm very new to this (have not even looked at the source code to Hotspot 
 yet), but is it possible
 to push the allocation/boxing to paths that are believed to be rarely taken?
 That's what partial EA does.  I'm trying to get Vladimir to work on it and it 
 seems I'm successful.

Graal also does partial EA, the code is available and readable.


 -- Chris

Rémi

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


hg: mlvm/mlvm/jdk: meth-aclone.patch: point fix for bug reported by Remi

2012-10-17 Thread john . r . rose
Changeset: d925ea8227c0
Author:jrose
Date:  2012-10-17 21:02 -0700
URL:   http://hg.openjdk.java.net/mlvm/mlvm/jdk/rev/d925ea8227c0

meth-aclone.patch: point fix for bug reported by Remi

+ meth-aclone-8001105.patch

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


hg: mlvm/mlvm/jdk: meth-lfi: refactor LF.Template to IBG.CodePattern and do cleanups; also assign some bug numbers

2012-10-17 Thread john . r . rose
Changeset: 51b63e67f83e
Author:jrose
Date:  2012-10-17 21:25 -0700
URL:   http://hg.openjdk.java.net/mlvm/mlvm/jdk/rev/51b63e67f83e

meth-lfi: refactor LF.Template to IBG.CodePattern and do cleanups; also assign 
some bug numbers

+ anno-stable-8001107.patch
- anno-stable.patch
+ meth-lfi-8001106.patch
- meth-lfi.patch
! meth.patch
! series

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


hg: mlvm/mlvm/hotspot: assign some bug numbers

2012-10-17 Thread john . r . rose
Changeset: b6f0babd7cf1
Author:jrose
Date:  2012-10-17 21:46 -0700
URL:   http://hg.openjdk.java.net/mlvm/mlvm/hotspot/rev/b6f0babd7cf1

assign some bug numbers

! anno-stable-8001107.patch  anno-stable.patch
! series
+ value-obj-800.patch
+ value-obj-800.txt
- value-obj.patch
- value-obj.txt

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev