Re: hg: mlvm/mlvm/hotspot: value-obj: first cut

2012-10-19 Thread David Chase

On 2012-10-18, at 6:36 PM, Remi Forax fo...@univ-mlv.fr wrote:
 Offhand, I don't know of any library code that manipulates Integers as 
 Integers and makes any sort of promises about their reference identity, 
 except for the nonsense about small-value interns (which we probably 
 replicate, because it is easy to do so).  ArrayLists and HashMaps store them 
 in Object-typed containers, so they'll retain their reference identity 
 there. (But there's a lot of library code, and I don't know it now as well 
 as I used to).
 
 IdentityHashMapInteger, X,
 https://duckduckgo.com/?q=IdentityHashMap%3CInteger

But it was my understanding that as yet, and for quite some time, generics are 
NOT reified, and that even when they are available, we're going to be careful 
at first about how we use them.

So at the bytecode level, that's all Objects, right?  I'm looking at this all 
from the point-of-view of bytecodes.
For the case of IdentityHashMapInteger, from old or new code, the underlying 
IdentityHashMap is really handling Objects, and so even though it is recompiled 
into new code, because the static type is Object at the bytecode level, the 
data will flow as refs, and not as Integers, and the identity will not be lost.

Am I relying on an unshared assumption?
I assume that value-typed values will be boxed whenever they are passed into a 
place where their static type is a reference type.
This means that many of the hoped-for benefits of value types won't arrive 
until generics are reified.

(I gather I need to look at Jim Laskey's tagged arrays.  From a superficial 
look, I'm not sure this is a huge problem; in any case, these are not yet a 
massive chunk of installed base.)

Maybe we need to think about both transitions (value types and reified 
generics) at the same time, I am not sure.
It looks to me like value safety has to come first and become widespread, but 
maybe I have that wrong.

 I disagree, at least for wrapper, if the code uses valueOf(), it means 
 you don't care about the identity.

Right, but I took care not to use valueOf in my example.  I took care to use 
methods that were declared to either return the same thing (toString) or a 
new thing (substring).  That code I wrote up there is designed to pass both 
assertions under the current (old) semantics.

 Given that because of the overriding you can have a mix of old code and 
 new code in the very same method (with inlining),
 I don't think that the version of the code is something useful here. And 
 as I said earlier, it will not be backward compatible,
 i.e. old code compiled with the new version will behave differently.

I think, if we define value types properly at the language level, that old code 
that would be sensitive to value-type-identity will fail to recompile with a 
new javac, so it will behave differently, but it won't be a silent change, 
nor will it be one that the VM has to worry about.  Maybe this is too big a 
leap for one revision, maybe we include some sort of a compatibility flag that 
allows the class to be tagged for old behavior.  If we're ever going get to 
well-behaved value types, we're going to need to ban some currently-legal 
idioms, so one day, there will be some old source code that fails to 
recompile.

I was under the impression that inlining already kept track of the 
source/flavor of code; for example, strictfp is handled properly in the face of 
inlining, so this is not new hair.

 Ugh, a somewhat more annoying question, inspired by trying to find an 
 example.
 What happens if we have a volatile Complex (using either strategy)?
 Assume, for the sake of amusement, that Complex is implemented with a pair 
 of doubles, hence is 128 bits at minimum in its value representation.  
 (Possible implementation -- as a value if there's a native CAS of the right 
 size, otherwise as a reference.)
 
 good question, you can hope that the current CPU understands an 
 instruction like CMPXCHG16 or the JIT will not be able to unbox the 
 Complex value, I suppose.

I think it's worse than this (I've been thinking about this on and off 
yesterday/today).  Suppose you are CASing a type that happens to be a value 
type, and happens to unbox to more bits that are supported by your native 
atomic operations.  So it has to be boxed (and we'll do whatever it takes to 
make sure it gets boxed).  But, in the intervening code, which is ordinary 
Java that just happens to be bracketed by a LOAD-CAS, that value might get 
unboxed and reboxed, and that-would-be-bad (the CAS would fail, gratuitously, 
because the pointer would change).  Maybe we take the approach that in local 
variables, if a ref is the source for a value-typed value, then it is are 
represented as a ref+value pair (John pointed this out), and if one or the 
other happens to go dead, that's okay.

Doing this in general would change the wrapper strategy somewhat; rather than 
blindly calling the preferred-unboxed version in the new code, the ref would 
also 

Re: hg: mlvm/mlvm/hotspot: value-obj: first cut

2012-10-18 Thread Remi Forax
On 10/17/2012 10:41 PM, David Chase wrote:
 On 2012-10-17, at 2:12 PM, Remi Forax fo...@univ-mlv.fr wrote:
 But we can't rely on this, hence it is not a true type property.  But we 
 could make it be as-if.
 I think I have to assume some sort of a marker class (implements 
 PermanentlyLockable).
 A bit in the class header (equivalent to implementing
 PermanentLyLockable) means
 you have now two classses, the one with the old semantics and the one
 with the new semantics.
 If you can have them both at runtime, you make your inlining cache less
 efficient,
 it's a problem I've had with PHP.reboot.
 Marking the instance seems a better idea.
 I'm not sure I follow this -- if j/l/Integer implements PermanentlyLockable, 
 that's just one class.

You can't forget backward compatibility. In Java 9, a program that was 
written for Java 1.0
should still work. This means either you have 2 different classes of 
Integer (one value type and of boxed type)
or you are in mixed mode and you allow Integers to be flagged as value 
type or not.

For new type, like Complex by example, it's simpler because you can have 
only one class
that works as a value type. But if you want the extra ball (as John 
said) you need a way to
flag which Integer is a value type or not, hence a tag bit in the object 
header.

Now, if an old program (jdk9) uses Integer.valueOf(), given the current 
rules, the developer already give up with the notion of identity, thus 
for these Integers, you can automatically flag them as value type.

At runtime, in the interpreter, you can profile the Integer to know if 
only value types are used or not
and if only value type are used, you can consider them as true value 
type when you JIT
and or you deopt or you create another code if a method is called with a 
boxed integer.

 You end up with possibly two versions of each entrypoint that handle any 
 Plockable, true, but this seems like a necessary consequence of supporting 
 both legacy (boxed-only) and modern (unboxed) implementations of Plockable 
 types.  The entrypoints are different interfaces at the machine level; I 
 don't see how you can avoid having two.  But many of the entrypoints might be 
 mere stubs/wrappers.

 I've been trying to figure out (Bharadwaj Yadavilli stopped by, we talked 
 about this) whether the per-instance Plockable bit needs to exist or not.

 Here are some assumptions I'm working from.  If any of these are wrong, that 
 would be useful to know:

 - we want value types in the future.

 - we want value types passed and returned in unboxed form

 - we want value types stored in arrays in unboxed form

 - we can upcast an array of value-elements to an array of reference-elements

 - we will sometimes box value types -- Object o = someInteger

 - we must support legacy code

 - we can use different compilation strategies for code depending on its 
 bytecode version number.

No, you can't.



 So, a strawman implementation might be the following:

 Use of values that implement Plockable in modern bytecodes is guaranteed to 
 conform to the various value-friendly restrictions.
 There's no extra bit, no extra call at allocation.
 They compile as value types, an occurrence of new-dup-loadargs-init is 
 replaced with running the constructor on the args on local memory.
 The only exception is when they are upcast to a reference supertype.

 In legacy bytecodes, none of this happens, it's just like today.  Mentions of 
 Plockable types compiled as if they were boxed.

 Compilation of any method that mentions a Plockable type in its signature 
 depends on legacy/modern.
 In modern, the default implementation is for unboxed, but a boxed stub is 
 provided (perhaps lazily) for references from legacy code.
 In legacy, the default implementation is for boxed, but an unboxed stub is 
 provided (perhaps lazily) for references from modern code.

 Arrays are nasty.
 In both modern and legacy code, arrays themselves are reference types, but 
 arrays of Plockable elements store the elements as value types.
 In both modern and legacy code, loads from arrays of a reference type (in 
 legacy code, Plockable is a reference type) with a Plockable subtype call a 
 static factory method of the Plockable type that can create a boxed object 
 given an array address and an index.  This can require an element-type 
 check before loads.
 Stores work in reverse, same assignment of responsibility to a method of the 
 Plockable type.

 Similarly, field loads/stores across the legacy/modern boundary box/unbox as 
 necessary to obtain expected behavior.

 Optimizations:
 In legacy code, use-def webs of Plockable that are free of identity-uses can 
 be unboxed.  Inlining of unboxing stubs from modern code might help here.
 In modern code, identifying use-def webs that connect calls to legacy methods 
 can be boxed, since the value representation will give no savings here.

 I assume I am missing something, because I think this is simpler than John's 
 proposal.  Am I 

Re: hg: mlvm/mlvm/hotspot: value-obj: first cut

2012-10-18 Thread David Chase
Note: I did go back through the archives for a few months to see if I had 
missed some earlier discussion of this.  Is there an even earlier discussion 
that I missed?

On 2012-10-18, at 3:21 AM, Remi Forax fo...@univ-mlv.fr wrote:
 You can't forget backward compatibility. In Java 9, a program that was 
 written for Java 1.0
 should still work. This means either you have 2 different classes of 
 Integer (one value type and of boxed type)
 or you are in mixed mode and you allow Integers to be flagged as value 
 type or not.

Can we work with an example program to make this more concrete?
Failure is losing track of object identity; the unbox/box game for 
legacy/modern compatibility only goes wrong when an important identity is 
dropped on the floor.  If there's no ==true relation, then reboxing just costs 
time.

If I imagine the version-specific-compilation game, refInteger only ceases to 
be refInteger within the internals of modern Library code (or in the case that 
an application is not recompiled, but some other library on which it depends 
is), and then only when it is not statically typed as Object or Number 
(reference supertypes).  So old code retains its semantics, exactly, and 
library retains its semantics in those cases where value types are referred to 
as Objects.  I'm assuming that we get to efficient, value-handling generics 
when we get to reified generics.

Offhand, I don't know of any library code that manipulates Integers as Integers 
and makes any sort of promises about their reference identity, except for the 
nonsense about small-value interns (which we probably replicate, because it is 
easy to do so).  ArrayLists and HashMaps store them in Object-typed containers, 
so they'll retain their reference identity there. (But there's a lot of library 
code, and I don't know it now as well as I used to).

The more-likely screw cases I think would involve String, in particular those 
cases where a library method promises to return an interned String, or other 
library code written by 3rd parties that does who-knows-what?

 But if you want the extra ball (as John 
 said) you need a way to
 flag which Integer is a value type or not, hence a tag bit in the object 
 header.

And I think this ends up being a runtime-static property, because value types 
have a completely different representation -- wider or narrower than a pointer, 
and no object header.  You must at least have two entrypoints; the old code is 
expecting (for behavioral reasons) pointer semantics, the new code is expecting 
(for performance reasons) value semantics.  (Remarks about cache inefficiency 
seem distracting until we figure out if we like the semantics, unless our 
choices send performance completely into the toilet.)  The interpreter I assume 
acts like it is legacy code.

If we use an instance-flag instead, don't you end up in the same boat with any 
Integer resulting from a call to an Integer-allocation method in the (modern 
implementation) library?  Integer.valueOf returns a value-tagged Integer, 
because the new occurs in code that will be recompiled into the modern world 
-- what if the result is used in a lock, in old code?
Or what if that is re-passed in to modern-compiled code?
(I'm trying to come up with an example, I think I have to use String instead.)

Here's an example -- String.toString(), the specification (at least, Java 6, 
which is handy in my browser) promises that this is implemented with return 
self;.  This is a screw case for either strategy, naively implemented:

// This is old code, not recompiled, calling new code from the library.
  String cat0 = cat; // a legacy string.
  String cat1 =  cat0.substring(0); // returns a modern-allocated new string.
  String cat2 = cat0.toString(); // same string
  String cat3 = cat1.toString(); // same string
  assert (cat0 == cat2) // works, tagged instances; fails, tagged code
  assert (cat1 == cat3) // fails, tagged code or tagged instance.

or, instead of ==, two threads could be dispatched, using cat1 and cat3 for 
their respective locks to coordinate execution (yes, I know the author of such 
code should be shot).  

To avoid the second fail, either we special-case the implementation of 
toString, or we both tag the code and tag the instances created within it.

Another way to put this is that legacy code will expect to see pointer identity 
observed, even if the original source of the pointer was in modern code.  The 
modern code won't care, but if the legacy code ever observes the pointer, it 
will expect it to behave like a pointer.  That's why I'm skeptical about an 
instance tag that depends on allocation site, and why I think that code 
identity is what matters more.

Ugh, a somewhat more annoying question, inspired by trying to find an example.
What happens if we have a volatile Complex (using either strategy)?
Assume, for the sake of amusement, that Complex is implemented with a pair of 
doubles, hence is 128 bits at minimum in its value 

Re: hg: mlvm/mlvm/hotspot: value-obj: first cut

2012-10-18 Thread Remi Forax
On 10/18/2012 03:20 PM, David Chase wrote:
 Note: I did go back through the archives for a few months to see if I had 
 missed some earlier discussion of this.  Is there an even earlier discussion 
 that I missed?

John has written a long blog post about value type this year,
   https://blogs.oracle.com/jrose/entry/value_types_in_the_vm
and the Array 2.0 persentation at the summit
http://www.oracle.com/technetwork/java/javase/community/jvmls2012-1840099.html

otherwise, some ideas float around since a long time :)


 On 2012-10-18, at 3:21 AM, Remi Forax fo...@univ-mlv.fr wrote:
 You can't forget backward compatibility. In Java 9, a program that was
 written for Java 1.0
 should still work. This means either you have 2 different classes of
 Integer (one value type and of boxed type)
 or you are in mixed mode and you allow Integers to be flagged as value
 type or not.
 Can we work with an example program to make this more concrete?
 Failure is losing track of object identity; the unbox/box game for 
 legacy/modern compatibility only goes wrong when an important identity is 
 dropped on the floor.  If there's no ==true relation, then reboxing just 
 costs time.

 If I imagine the version-specific-compilation game, refInteger only ceases to 
 be refInteger within the internals of modern Library code (or in the case 
 that an application is not recompiled, but some other library on which it 
 depends is), and then only when it is not statically typed as Object or 
 Number (reference supertypes).  So old code retains its semantics, exactly, 
 and library retains its semantics in those cases where value types are 
 referred to as Objects.  I'm assuming that we get to efficient, 
 value-handling generics when we get to reified generics.

 Offhand, I don't know of any library code that manipulates Integers as 
 Integers and makes any sort of promises about their reference identity, 
 except for the nonsense about small-value interns (which we probably 
 replicate, because it is easy to do so).  ArrayLists and HashMaps store them 
 in Object-typed containers, so they'll retain their reference identity there. 
 (But there's a lot of library code, and I don't know it now as well as I used 
 to).

IdentityHashMapInteger, X,
https://duckduckgo.com/?q=IdentityHashMap%3CInteger


 The more-likely screw cases I think would involve String, in particular those 
 cases where a library method promises to return an interned String, or other 
 library code written by 3rd parties that does who-knows-what?

I am not sure String is a good candidate to be seen as a value type. 
String array can be big, what you want for String is jsut colocation of 
the String object and the array of chars object.
This can be done with a Maxine like hybrid object.


 But if you want the extra ball (as John
 said) you need a way to
 flag which Integer is a value type or not, hence a tag bit in the object
 header.
 And I think this ends up being a runtime-static property, because value types 
 have a completely different representation -- wider or narrower than a 
 pointer, and no object header.  You must at least have two entrypoints; the 
 old code is expecting (for behavioral reasons) pointer semantics, the new 
 code is expecting (for performance reasons) value semantics.  (Remarks about 
 cache inefficiency seem distracting until we figure out if we like the 
 semantics, unless our choices send performance completely into the toilet.)  
 The interpreter I assume acts like it is legacy code.

I'm not sure you need two entry points in all cases, you can also deopt 
if you have to be compatible with the boxing semantics,
a value object is just a way to say I don't care about the identity, so 
the JIT may optimize.


 If we use an instance-flag instead, don't you end up in the same boat with 
 any Integer resulting from a call to an Integer-allocation method in the 
 (modern implementation) library?  Integer.valueOf returns a value-tagged 
 Integer, because the new occurs in code that will be recompiled into the 
 modern world -- what if the result is used in a lock, in old code?

If a user uses valueOf actually, he has no way to control the identity 
of the resulting object, the JLS says that values between -128 and 127 
must be boxed
but doesn't say that value greater than 127 can not be boxed (the 
OpenJDK implementation allow you to change the upper value on the 
command line, BTW).
So you can safely rebox the object if it is used in a lock, the 
semantics will be as broken as the original semantics.

 Or what if that is re-passed in to modern-compiled code?
 (I'm trying to come up with an example, I think I have to use String instead.)

it will be unboxed at the frontier.


 Here's an example -- String.toString(), the specification (at least, Java 6, 
 which is handy in my browser) promises that this is implemented with return 
 self;.  This is a screw case for either strategy, naively implemented:

 // This is old code, not recompiled, 

Re: hg: mlvm/mlvm/hotspot: value-obj: first cut

2012-10-17 Thread Remi Forax
On 10/17/2012 05:23 PM, David Chase wrote:
 On 2012-10-16, at 5:14 AM, Remi Forax fo...@univ-mlv.fr wrote:

 Frozen/locked is a runtime property, not a type property so it's harder
 that that.
 You have to do a frozen check at the beginning of the method and pray
 that people
 will only use it with frozen object and not a not frozen one because in
 that case, you have to de-optimize.
 Maybe, you can have two versions on the same method, one with the frozen
 semantics and
 one with the boxed one (this is what I have done in JDart).
 I'm still coming up to speed on this, but I thought that the entire point of 
 having value objects
 is so that we would have a non-standard interface for all methods dealing 
 with value objects.
 Complex, boxed, is received as a single pointer to an object with headers 
 and fields.
 Complex, unboxed, is received as a pair of double.  The frozen check is 
 punted to the caller,
 who in turn may have punted it to his caller, etc, potentially removing the 
 need for all tests.

 Or did I read this wrong?

 The only place I see a need for a frozen check is when we are interoperating 
 with legacy code
 that is not playing the frozen-object game, and that we want to run with 
 complete legacy compatibility.
 In the case, the slow-and-boxed path also includes a frozen check -- if 
 frozen, unbox the object,
 and head for the fast path, otherwise, stay slow.

 From the notes (value-obj.txt) I see:

 38 - the reference returned from the (unsafe) marking primitive must be used 
 for all future accesses
 39 - any previous references (including the one passed to the marking 
 primitive) must be unused
 40  - in practice, this means you must mark an object locked immediately 
 after constructing it

 So, allocation of a value-object becomes something along the lines of

new java/lang/Integer
dup
iload ...
invokespecial java/lang/Integer.init(I)V
markingPrimitive

 But we can't rely on this, hence it is not a true type property.  But we 
 could make it be as-if.
 I think I have to assume some sort of a marker class (implements 
 PermanentlyLockable).

A bit in the class header (equivalent to implementing 
PermanentLyLockable) means
you have now two classses, the one with the old semantics and the one 
with the new semantics.
If you can have them both at runtime, you make your inlining cache less 
efficient,
it's a problem I've had with PHP.reboot.
Marking the instance seems a better idea.

 Then in bytecode version N+1, the verifier enforces this for all types 
 implementing PL, and
 all methods trucking in PL-implementing objects will by default generated 
 unboxed entrypoints.

 Except when dealing with legacy code, it's as good as a type.

100% of the produced code until now is what you call 'legacy' :)


 For legacy code, I think we have options.  Simplest is just to box at the 
 boundaries, with lazy
 compilation of boxed versions of PL-handling methods in modern bytecodes.  
 I'm trying to decide
 if we can do better with flow analysis; I think it has to be non-publishing 
 in the PL types, in addition
 to the other properties.

You have to box and unbox a boundaries and because Java allows 
overriding, an interface can have
two methods one which is implemented with boxing semantics and an other 
which use the frozen semantics.
So you need stub codes in front of method similar to verified/unverified 
entry points.


 David

RĂ©mi

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: hg: mlvm/mlvm/hotspot: value-obj: first cut

2012-10-17 Thread David Chase

On 2012-10-17, at 2:12 PM, Remi Forax fo...@univ-mlv.fr wrote:
 But we can't rely on this, hence it is not a true type property.  But we 
 could make it be as-if.
 I think I have to assume some sort of a marker class (implements 
 PermanentlyLockable).
 
 A bit in the class header (equivalent to implementing 
 PermanentLyLockable) means
 you have now two classses, the one with the old semantics and the one 
 with the new semantics.
 If you can have them both at runtime, you make your inlining cache less 
 efficient,
 it's a problem I've had with PHP.reboot.
 Marking the instance seems a better idea.

I'm not sure I follow this -- if j/l/Integer implements PermanentlyLockable, 
that's just one class.
You end up with possibly two versions of each entrypoint that handle any 
Plockable, true, but this seems like a necessary consequence of supporting both 
legacy (boxed-only) and modern (unboxed) implementations of Plockable types.  
The entrypoints are different interfaces at the machine level; I don't see how 
you can avoid having two.  But many of the entrypoints might be mere 
stubs/wrappers.

I've been trying to figure out (Bharadwaj Yadavilli stopped by, we talked about 
this) whether the per-instance Plockable bit needs to exist or not.

Here are some assumptions I'm working from.  If any of these are wrong, that 
would be useful to know:

- we want value types in the future.

- we want value types passed and returned in unboxed form

- we want value types stored in arrays in unboxed form

- we can upcast an array of value-elements to an array of reference-elements

- we will sometimes box value types -- Object o = someInteger

- we must support legacy code

- we can use different compilation strategies for code depending on its 
bytecode version number.


So, a strawman implementation might be the following:

Use of values that implement Plockable in modern bytecodes is guaranteed to 
conform to the various value-friendly restrictions.
There's no extra bit, no extra call at allocation.
They compile as value types, an occurrence of new-dup-loadargs-init is replaced 
with running the constructor on the args on local memory.
The only exception is when they are upcast to a reference supertype.

In legacy bytecodes, none of this happens, it's just like today.  Mentions of 
Plockable types compiled as if they were boxed.

Compilation of any method that mentions a Plockable type in its signature 
depends on legacy/modern.
In modern, the default implementation is for unboxed, but a boxed stub is 
provided (perhaps lazily) for references from legacy code.
In legacy, the default implementation is for boxed, but an unboxed stub is 
provided (perhaps lazily) for references from modern code.

Arrays are nasty.
In both modern and legacy code, arrays themselves are reference types, but 
arrays of Plockable elements store the elements as value types.
In both modern and legacy code, loads from arrays of a reference type (in 
legacy code, Plockable is a reference type) with a Plockable subtype call a 
static factory method of the Plockable type that can create a boxed object 
given an array address and an index.  This can require an element-type check 
before loads.
Stores work in reverse, same assignment of responsibility to a method of the 
Plockable type.

Similarly, field loads/stores across the legacy/modern boundary box/unbox as 
necessary to obtain expected behavior.

Optimizations:
In legacy code, use-def webs of Plockable that are free of identity-uses can be 
unboxed.  Inlining of unboxing stubs from modern code might help here.
In modern code, identifying use-def webs that connect calls to legacy methods 
can be boxed, since the value representation will give no savings here.

I assume I am missing something, because I think this is simpler than John's 
proposal.  Am I skipping ahead straight to value types too quickly?

David

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


hg: mlvm/mlvm/hotspot: value-obj: first cut

2012-10-12 Thread john . r . rose
Changeset: aa8f59e8372f
Author:jrose
Date:  2012-10-12 12:35 -0700
URL:   http://hg.openjdk.java.net/mlvm/mlvm/hotspot/rev/aa8f59e8372f

value-obj: first cut

! series
! value-obj.patch  final-obj.patch
! value-obj.txt

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev