Re: hg: mlvm/mlvm/hotspot: value-obj: first cut
On 2012-10-18, at 6:36 PM, Remi Forax fo...@univ-mlv.fr wrote: Offhand, I don't know of any library code that manipulates Integers as Integers and makes any sort of promises about their reference identity, except for the nonsense about small-value interns (which we probably replicate, because it is easy to do so). ArrayLists and HashMaps store them in Object-typed containers, so they'll retain their reference identity there. (But there's a lot of library code, and I don't know it now as well as I used to). IdentityHashMapInteger, X, https://duckduckgo.com/?q=IdentityHashMap%3CInteger But it was my understanding that as yet, and for quite some time, generics are NOT reified, and that even when they are available, we're going to be careful at first about how we use them. So at the bytecode level, that's all Objects, right? I'm looking at this all from the point-of-view of bytecodes. For the case of IdentityHashMapInteger, from old or new code, the underlying IdentityHashMap is really handling Objects, and so even though it is recompiled into new code, because the static type is Object at the bytecode level, the data will flow as refs, and not as Integers, and the identity will not be lost. Am I relying on an unshared assumption? I assume that value-typed values will be boxed whenever they are passed into a place where their static type is a reference type. This means that many of the hoped-for benefits of value types won't arrive until generics are reified. (I gather I need to look at Jim Laskey's tagged arrays. From a superficial look, I'm not sure this is a huge problem; in any case, these are not yet a massive chunk of installed base.) Maybe we need to think about both transitions (value types and reified generics) at the same time, I am not sure. It looks to me like value safety has to come first and become widespread, but maybe I have that wrong. I disagree, at least for wrapper, if the code uses valueOf(), it means you don't care about the identity. Right, but I took care not to use valueOf in my example. I took care to use methods that were declared to either return the same thing (toString) or a new thing (substring). That code I wrote up there is designed to pass both assertions under the current (old) semantics. Given that because of the overriding you can have a mix of old code and new code in the very same method (with inlining), I don't think that the version of the code is something useful here. And as I said earlier, it will not be backward compatible, i.e. old code compiled with the new version will behave differently. I think, if we define value types properly at the language level, that old code that would be sensitive to value-type-identity will fail to recompile with a new javac, so it will behave differently, but it won't be a silent change, nor will it be one that the VM has to worry about. Maybe this is too big a leap for one revision, maybe we include some sort of a compatibility flag that allows the class to be tagged for old behavior. If we're ever going get to well-behaved value types, we're going to need to ban some currently-legal idioms, so one day, there will be some old source code that fails to recompile. I was under the impression that inlining already kept track of the source/flavor of code; for example, strictfp is handled properly in the face of inlining, so this is not new hair. Ugh, a somewhat more annoying question, inspired by trying to find an example. What happens if we have a volatile Complex (using either strategy)? Assume, for the sake of amusement, that Complex is implemented with a pair of doubles, hence is 128 bits at minimum in its value representation. (Possible implementation -- as a value if there's a native CAS of the right size, otherwise as a reference.) good question, you can hope that the current CPU understands an instruction like CMPXCHG16 or the JIT will not be able to unbox the Complex value, I suppose. I think it's worse than this (I've been thinking about this on and off yesterday/today). Suppose you are CASing a type that happens to be a value type, and happens to unbox to more bits that are supported by your native atomic operations. So it has to be boxed (and we'll do whatever it takes to make sure it gets boxed). But, in the intervening code, which is ordinary Java that just happens to be bracketed by a LOAD-CAS, that value might get unboxed and reboxed, and that-would-be-bad (the CAS would fail, gratuitously, because the pointer would change). Maybe we take the approach that in local variables, if a ref is the source for a value-typed value, then it is are represented as a ref+value pair (John pointed this out), and if one or the other happens to go dead, that's okay. Doing this in general would change the wrapper strategy somewhat; rather than blindly calling the preferred-unboxed version in the new code, the ref would also
Re: hg: mlvm/mlvm/hotspot: value-obj: first cut
On 10/17/2012 10:41 PM, David Chase wrote: On 2012-10-17, at 2:12 PM, Remi Forax fo...@univ-mlv.fr wrote: But we can't rely on this, hence it is not a true type property. But we could make it be as-if. I think I have to assume some sort of a marker class (implements PermanentlyLockable). A bit in the class header (equivalent to implementing PermanentLyLockable) means you have now two classses, the one with the old semantics and the one with the new semantics. If you can have them both at runtime, you make your inlining cache less efficient, it's a problem I've had with PHP.reboot. Marking the instance seems a better idea. I'm not sure I follow this -- if j/l/Integer implements PermanentlyLockable, that's just one class. You can't forget backward compatibility. In Java 9, a program that was written for Java 1.0 should still work. This means either you have 2 different classes of Integer (one value type and of boxed type) or you are in mixed mode and you allow Integers to be flagged as value type or not. For new type, like Complex by example, it's simpler because you can have only one class that works as a value type. But if you want the extra ball (as John said) you need a way to flag which Integer is a value type or not, hence a tag bit in the object header. Now, if an old program (jdk9) uses Integer.valueOf(), given the current rules, the developer already give up with the notion of identity, thus for these Integers, you can automatically flag them as value type. At runtime, in the interpreter, you can profile the Integer to know if only value types are used or not and if only value type are used, you can consider them as true value type when you JIT and or you deopt or you create another code if a method is called with a boxed integer. You end up with possibly two versions of each entrypoint that handle any Plockable, true, but this seems like a necessary consequence of supporting both legacy (boxed-only) and modern (unboxed) implementations of Plockable types. The entrypoints are different interfaces at the machine level; I don't see how you can avoid having two. But many of the entrypoints might be mere stubs/wrappers. I've been trying to figure out (Bharadwaj Yadavilli stopped by, we talked about this) whether the per-instance Plockable bit needs to exist or not. Here are some assumptions I'm working from. If any of these are wrong, that would be useful to know: - we want value types in the future. - we want value types passed and returned in unboxed form - we want value types stored in arrays in unboxed form - we can upcast an array of value-elements to an array of reference-elements - we will sometimes box value types -- Object o = someInteger - we must support legacy code - we can use different compilation strategies for code depending on its bytecode version number. No, you can't. So, a strawman implementation might be the following: Use of values that implement Plockable in modern bytecodes is guaranteed to conform to the various value-friendly restrictions. There's no extra bit, no extra call at allocation. They compile as value types, an occurrence of new-dup-loadargs-init is replaced with running the constructor on the args on local memory. The only exception is when they are upcast to a reference supertype. In legacy bytecodes, none of this happens, it's just like today. Mentions of Plockable types compiled as if they were boxed. Compilation of any method that mentions a Plockable type in its signature depends on legacy/modern. In modern, the default implementation is for unboxed, but a boxed stub is provided (perhaps lazily) for references from legacy code. In legacy, the default implementation is for boxed, but an unboxed stub is provided (perhaps lazily) for references from modern code. Arrays are nasty. In both modern and legacy code, arrays themselves are reference types, but arrays of Plockable elements store the elements as value types. In both modern and legacy code, loads from arrays of a reference type (in legacy code, Plockable is a reference type) with a Plockable subtype call a static factory method of the Plockable type that can create a boxed object given an array address and an index. This can require an element-type check before loads. Stores work in reverse, same assignment of responsibility to a method of the Plockable type. Similarly, field loads/stores across the legacy/modern boundary box/unbox as necessary to obtain expected behavior. Optimizations: In legacy code, use-def webs of Plockable that are free of identity-uses can be unboxed. Inlining of unboxing stubs from modern code might help here. In modern code, identifying use-def webs that connect calls to legacy methods can be boxed, since the value representation will give no savings here. I assume I am missing something, because I think this is simpler than John's proposal. Am I
Re: hg: mlvm/mlvm/hotspot: value-obj: first cut
Note: I did go back through the archives for a few months to see if I had missed some earlier discussion of this. Is there an even earlier discussion that I missed? On 2012-10-18, at 3:21 AM, Remi Forax fo...@univ-mlv.fr wrote: You can't forget backward compatibility. In Java 9, a program that was written for Java 1.0 should still work. This means either you have 2 different classes of Integer (one value type and of boxed type) or you are in mixed mode and you allow Integers to be flagged as value type or not. Can we work with an example program to make this more concrete? Failure is losing track of object identity; the unbox/box game for legacy/modern compatibility only goes wrong when an important identity is dropped on the floor. If there's no ==true relation, then reboxing just costs time. If I imagine the version-specific-compilation game, refInteger only ceases to be refInteger within the internals of modern Library code (or in the case that an application is not recompiled, but some other library on which it depends is), and then only when it is not statically typed as Object or Number (reference supertypes). So old code retains its semantics, exactly, and library retains its semantics in those cases where value types are referred to as Objects. I'm assuming that we get to efficient, value-handling generics when we get to reified generics. Offhand, I don't know of any library code that manipulates Integers as Integers and makes any sort of promises about their reference identity, except for the nonsense about small-value interns (which we probably replicate, because it is easy to do so). ArrayLists and HashMaps store them in Object-typed containers, so they'll retain their reference identity there. (But there's a lot of library code, and I don't know it now as well as I used to). The more-likely screw cases I think would involve String, in particular those cases where a library method promises to return an interned String, or other library code written by 3rd parties that does who-knows-what? But if you want the extra ball (as John said) you need a way to flag which Integer is a value type or not, hence a tag bit in the object header. And I think this ends up being a runtime-static property, because value types have a completely different representation -- wider or narrower than a pointer, and no object header. You must at least have two entrypoints; the old code is expecting (for behavioral reasons) pointer semantics, the new code is expecting (for performance reasons) value semantics. (Remarks about cache inefficiency seem distracting until we figure out if we like the semantics, unless our choices send performance completely into the toilet.) The interpreter I assume acts like it is legacy code. If we use an instance-flag instead, don't you end up in the same boat with any Integer resulting from a call to an Integer-allocation method in the (modern implementation) library? Integer.valueOf returns a value-tagged Integer, because the new occurs in code that will be recompiled into the modern world -- what if the result is used in a lock, in old code? Or what if that is re-passed in to modern-compiled code? (I'm trying to come up with an example, I think I have to use String instead.) Here's an example -- String.toString(), the specification (at least, Java 6, which is handy in my browser) promises that this is implemented with return self;. This is a screw case for either strategy, naively implemented: // This is old code, not recompiled, calling new code from the library. String cat0 = cat; // a legacy string. String cat1 = cat0.substring(0); // returns a modern-allocated new string. String cat2 = cat0.toString(); // same string String cat3 = cat1.toString(); // same string assert (cat0 == cat2) // works, tagged instances; fails, tagged code assert (cat1 == cat3) // fails, tagged code or tagged instance. or, instead of ==, two threads could be dispatched, using cat1 and cat3 for their respective locks to coordinate execution (yes, I know the author of such code should be shot). To avoid the second fail, either we special-case the implementation of toString, or we both tag the code and tag the instances created within it. Another way to put this is that legacy code will expect to see pointer identity observed, even if the original source of the pointer was in modern code. The modern code won't care, but if the legacy code ever observes the pointer, it will expect it to behave like a pointer. That's why I'm skeptical about an instance tag that depends on allocation site, and why I think that code identity is what matters more. Ugh, a somewhat more annoying question, inspired by trying to find an example. What happens if we have a volatile Complex (using either strategy)? Assume, for the sake of amusement, that Complex is implemented with a pair of doubles, hence is 128 bits at minimum in its value
Re: hg: mlvm/mlvm/hotspot: value-obj: first cut
On 10/18/2012 03:20 PM, David Chase wrote: Note: I did go back through the archives for a few months to see if I had missed some earlier discussion of this. Is there an even earlier discussion that I missed? John has written a long blog post about value type this year, https://blogs.oracle.com/jrose/entry/value_types_in_the_vm and the Array 2.0 persentation at the summit http://www.oracle.com/technetwork/java/javase/community/jvmls2012-1840099.html otherwise, some ideas float around since a long time :) On 2012-10-18, at 3:21 AM, Remi Forax fo...@univ-mlv.fr wrote: You can't forget backward compatibility. In Java 9, a program that was written for Java 1.0 should still work. This means either you have 2 different classes of Integer (one value type and of boxed type) or you are in mixed mode and you allow Integers to be flagged as value type or not. Can we work with an example program to make this more concrete? Failure is losing track of object identity; the unbox/box game for legacy/modern compatibility only goes wrong when an important identity is dropped on the floor. If there's no ==true relation, then reboxing just costs time. If I imagine the version-specific-compilation game, refInteger only ceases to be refInteger within the internals of modern Library code (or in the case that an application is not recompiled, but some other library on which it depends is), and then only when it is not statically typed as Object or Number (reference supertypes). So old code retains its semantics, exactly, and library retains its semantics in those cases where value types are referred to as Objects. I'm assuming that we get to efficient, value-handling generics when we get to reified generics. Offhand, I don't know of any library code that manipulates Integers as Integers and makes any sort of promises about their reference identity, except for the nonsense about small-value interns (which we probably replicate, because it is easy to do so). ArrayLists and HashMaps store them in Object-typed containers, so they'll retain their reference identity there. (But there's a lot of library code, and I don't know it now as well as I used to). IdentityHashMapInteger, X, https://duckduckgo.com/?q=IdentityHashMap%3CInteger The more-likely screw cases I think would involve String, in particular those cases where a library method promises to return an interned String, or other library code written by 3rd parties that does who-knows-what? I am not sure String is a good candidate to be seen as a value type. String array can be big, what you want for String is jsut colocation of the String object and the array of chars object. This can be done with a Maxine like hybrid object. But if you want the extra ball (as John said) you need a way to flag which Integer is a value type or not, hence a tag bit in the object header. And I think this ends up being a runtime-static property, because value types have a completely different representation -- wider or narrower than a pointer, and no object header. You must at least have two entrypoints; the old code is expecting (for behavioral reasons) pointer semantics, the new code is expecting (for performance reasons) value semantics. (Remarks about cache inefficiency seem distracting until we figure out if we like the semantics, unless our choices send performance completely into the toilet.) The interpreter I assume acts like it is legacy code. I'm not sure you need two entry points in all cases, you can also deopt if you have to be compatible with the boxing semantics, a value object is just a way to say I don't care about the identity, so the JIT may optimize. If we use an instance-flag instead, don't you end up in the same boat with any Integer resulting from a call to an Integer-allocation method in the (modern implementation) library? Integer.valueOf returns a value-tagged Integer, because the new occurs in code that will be recompiled into the modern world -- what if the result is used in a lock, in old code? If a user uses valueOf actually, he has no way to control the identity of the resulting object, the JLS says that values between -128 and 127 must be boxed but doesn't say that value greater than 127 can not be boxed (the OpenJDK implementation allow you to change the upper value on the command line, BTW). So you can safely rebox the object if it is used in a lock, the semantics will be as broken as the original semantics. Or what if that is re-passed in to modern-compiled code? (I'm trying to come up with an example, I think I have to use String instead.) it will be unboxed at the frontier. Here's an example -- String.toString(), the specification (at least, Java 6, which is handy in my browser) promises that this is implemented with return self;. This is a screw case for either strategy, naively implemented: // This is old code, not recompiled,
Re: hg: mlvm/mlvm/hotspot: value-obj: first cut
On 10/17/2012 05:23 PM, David Chase wrote: On 2012-10-16, at 5:14 AM, Remi Forax fo...@univ-mlv.fr wrote: Frozen/locked is a runtime property, not a type property so it's harder that that. You have to do a frozen check at the beginning of the method and pray that people will only use it with frozen object and not a not frozen one because in that case, you have to de-optimize. Maybe, you can have two versions on the same method, one with the frozen semantics and one with the boxed one (this is what I have done in JDart). I'm still coming up to speed on this, but I thought that the entire point of having value objects is so that we would have a non-standard interface for all methods dealing with value objects. Complex, boxed, is received as a single pointer to an object with headers and fields. Complex, unboxed, is received as a pair of double. The frozen check is punted to the caller, who in turn may have punted it to his caller, etc, potentially removing the need for all tests. Or did I read this wrong? The only place I see a need for a frozen check is when we are interoperating with legacy code that is not playing the frozen-object game, and that we want to run with complete legacy compatibility. In the case, the slow-and-boxed path also includes a frozen check -- if frozen, unbox the object, and head for the fast path, otherwise, stay slow. From the notes (value-obj.txt) I see: 38 - the reference returned from the (unsafe) marking primitive must be used for all future accesses 39 - any previous references (including the one passed to the marking primitive) must be unused 40 - in practice, this means you must mark an object locked immediately after constructing it So, allocation of a value-object becomes something along the lines of new java/lang/Integer dup iload ... invokespecial java/lang/Integer.init(I)V markingPrimitive But we can't rely on this, hence it is not a true type property. But we could make it be as-if. I think I have to assume some sort of a marker class (implements PermanentlyLockable). A bit in the class header (equivalent to implementing PermanentLyLockable) means you have now two classses, the one with the old semantics and the one with the new semantics. If you can have them both at runtime, you make your inlining cache less efficient, it's a problem I've had with PHP.reboot. Marking the instance seems a better idea. Then in bytecode version N+1, the verifier enforces this for all types implementing PL, and all methods trucking in PL-implementing objects will by default generated unboxed entrypoints. Except when dealing with legacy code, it's as good as a type. 100% of the produced code until now is what you call 'legacy' :) For legacy code, I think we have options. Simplest is just to box at the boundaries, with lazy compilation of boxed versions of PL-handling methods in modern bytecodes. I'm trying to decide if we can do better with flow analysis; I think it has to be non-publishing in the PL types, in addition to the other properties. You have to box and unbox a boundaries and because Java allows overriding, an interface can have two methods one which is implemented with boxing semantics and an other which use the frozen semantics. So you need stub codes in front of method similar to verified/unverified entry points. David RĂ©mi ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: hg: mlvm/mlvm/hotspot: value-obj: first cut
On 2012-10-17, at 2:12 PM, Remi Forax fo...@univ-mlv.fr wrote: But we can't rely on this, hence it is not a true type property. But we could make it be as-if. I think I have to assume some sort of a marker class (implements PermanentlyLockable). A bit in the class header (equivalent to implementing PermanentLyLockable) means you have now two classses, the one with the old semantics and the one with the new semantics. If you can have them both at runtime, you make your inlining cache less efficient, it's a problem I've had with PHP.reboot. Marking the instance seems a better idea. I'm not sure I follow this -- if j/l/Integer implements PermanentlyLockable, that's just one class. You end up with possibly two versions of each entrypoint that handle any Plockable, true, but this seems like a necessary consequence of supporting both legacy (boxed-only) and modern (unboxed) implementations of Plockable types. The entrypoints are different interfaces at the machine level; I don't see how you can avoid having two. But many of the entrypoints might be mere stubs/wrappers. I've been trying to figure out (Bharadwaj Yadavilli stopped by, we talked about this) whether the per-instance Plockable bit needs to exist or not. Here are some assumptions I'm working from. If any of these are wrong, that would be useful to know: - we want value types in the future. - we want value types passed and returned in unboxed form - we want value types stored in arrays in unboxed form - we can upcast an array of value-elements to an array of reference-elements - we will sometimes box value types -- Object o = someInteger - we must support legacy code - we can use different compilation strategies for code depending on its bytecode version number. So, a strawman implementation might be the following: Use of values that implement Plockable in modern bytecodes is guaranteed to conform to the various value-friendly restrictions. There's no extra bit, no extra call at allocation. They compile as value types, an occurrence of new-dup-loadargs-init is replaced with running the constructor on the args on local memory. The only exception is when they are upcast to a reference supertype. In legacy bytecodes, none of this happens, it's just like today. Mentions of Plockable types compiled as if they were boxed. Compilation of any method that mentions a Plockable type in its signature depends on legacy/modern. In modern, the default implementation is for unboxed, but a boxed stub is provided (perhaps lazily) for references from legacy code. In legacy, the default implementation is for boxed, but an unboxed stub is provided (perhaps lazily) for references from modern code. Arrays are nasty. In both modern and legacy code, arrays themselves are reference types, but arrays of Plockable elements store the elements as value types. In both modern and legacy code, loads from arrays of a reference type (in legacy code, Plockable is a reference type) with a Plockable subtype call a static factory method of the Plockable type that can create a boxed object given an array address and an index. This can require an element-type check before loads. Stores work in reverse, same assignment of responsibility to a method of the Plockable type. Similarly, field loads/stores across the legacy/modern boundary box/unbox as necessary to obtain expected behavior. Optimizations: In legacy code, use-def webs of Plockable that are free of identity-uses can be unboxed. Inlining of unboxing stubs from modern code might help here. In modern code, identifying use-def webs that connect calls to legacy methods can be boxed, since the value representation will give no savings here. I assume I am missing something, because I think this is simpler than John's proposal. Am I skipping ahead straight to value types too quickly? David ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
hg: mlvm/mlvm/hotspot: value-obj: first cut
Changeset: aa8f59e8372f Author:jrose Date: 2012-10-12 12:35 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/hotspot/rev/aa8f59e8372f value-obj: first cut ! series ! value-obj.patch final-obj.patch ! value-obj.txt ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev