abandon all U-types, welcome to L-world (or, what I learned in Burlington)
We just had a 50-hour week of face-to-face meetings by the Valhalla VM team. We learned a lot and surprised ourselves by coming to a consensus that a promising design for value types uses mainly the same legacy L-type descriptors, makes relatively little use of Q-type descriptors, and does not appear to need a third descriptor "kind" or "mode", such as U for universal, or R for reference-only. First a few highlights out of many. Fred Parain explained to us how he has prototyped a thread-local analog of Java heaps to store value structs in a form convenient to the interpreter. Tobias Hartmann and Roland Westrelin (of Red Hat) explained what the compiler prefers to see, which is (obviously) the scalarized components of each value. The three of them have worked out detailed rules for calling between interpreted and compiled code. It seems to me that other implementations of the JVM (looking at you, IBM) will tend in similar directions, so although our results are strongly informed by our own prototyping, we think it is likely that they will apply to other, independent JVM implementations. (Or are there platforms where the interpreter will scalarize aggressively and the optimizer will prefer to keep everything in structs? Not.) Karen Kinnear and the Oracle Valhalla lead, David Simms, were there to make sure we solved the important problems and asked the hard questions. As a special appearance, one of our spec. gurus, Dan Smith, was there to help us make rigorous sense out of our intuitions and hacks. Since we were short on language experts, we just worked in the mode (my personal favorite) of pretending that the JVM is the most important thing, and the Java language designers will just have to figure out how to use it. Of course, that's an oversimplification; the JLS and JVMS inform each other very strongly, but it was freeing to temporarily take current thoughts about JLS extensions as a given and vary the JVM to find the sweet spot that would be simple to implement and supportive to what we think we know about the Valhalla Java of the future. We had some long conversations about carrier types: L, Q, U, and more, and that's what I want to write about here. We also make significant progress in the design of crackable lambdas, template classes, and current and future versions of condy. We talked to Ron Pressler about kick starting Loom fibers. But it is L-types I want to talk about here; the above is just a sketch of the past week's environment. Logically speaking, we have two things we want to do, and that unfolds to a choice between three "worlds" of up to four distinct kinds: L/Q/U/R. L is always present because it is a legacy model for reference types. Q is always present because we know we need (at least sometimes) to make a syntactic distinction between flattened values and legacy objects. (Why not just always look inside the classfile? Because the verifier cannot be expected to load a class for every type it sees, so needs a descriptor kind character from time to time.) The U kind came a year or two ago when we realized that any-generics (and/or templates) and interfaces both require a disjoint-union type that is neither Q nor L, but can keep track of Q payloads (value instances) and L payloads (nullable references to object instances), without mixing them up. In other words, neither Q-types nor legacy L-types are parallel class-based constructs, and neither conveniently "sits on top" of the other; they need a common supertype to carry them without confusion. Before I describe the three logically possible "worlds", I'll add one more letter, R. An R-type is exactly a legacy L-type, a nullable reference. Why use a separate letter? Answer: For the same reason we introduced the other kind letters, to preserve all the necessary distinctions among different kinds of payloads and carrier types, and also to talk about the explicit encoding of descriptors. There are three worlds we could design to hold both legacy R-types (today's L-types) and Q-types: U-world, L-world, and R-world. They might be notated respectively as U/QL, L/Q, and U/QR. The "U-world" is what I have been mentally preparing for for many months. It is the design where L-types, marked as such in bytecode type descriptors, are always legacy object references or null, and Q-types, also marked as such in bytecodes, are always new value types. To carry runtime payloads which may dynamically vary between the two modes, we need a third mode, U-types, which carry the two kinds of payloads (I hesitate to say "values" because I want to include reference values also). A U-type is a disjoint union between corresponding, similarly named Q-types and L-types. (Mathematically, a _disjoint union_ of C = A |_| B is no more and no less than the sum of all elements or points comprised by the two constituent sets A and B. The disjoint union has nothing more: no points not in A or B. It has nothing less: every point of C is from eith
Re: abandon all U-types, welcome to L-world (or, what I learned in Burlington)
To summarize for myself, we already know that we only need only one U, java.lang.__Value, let try to make it java.lang.Object (with no boxing). The claim is that Object is used more as the root of any types like in collections than as the root of all references like in System.out.println(). Ok, i need to think more about that. regards, Rémi - Mail original - > De: "John Rose" > À: "valhalla-spec-experts" > Envoyé: Dimanche 19 Novembre 2017 22:40:33 > Objet: abandon all U-types, welcome to L-world (or, what I learned in > Burlington) > We just had a 50-hour week of face-to-face meetings by the > Valhalla VM team. We learned a lot and surprised ourselves > by coming to a consensus that a promising design for value > types uses mainly the same legacy L-type descriptors, makes > relatively little use of Q-type descriptors, and does not appear > to need a third descriptor "kind" or "mode", such as U for > universal, or R for reference-only. > > First a few highlights out of many. Fred Parain explained to us how > he has prototyped a thread-local analog of Java heaps to store value > structs in a form convenient to the interpreter. Tobias Hartmann > and Roland Westrelin (of Red Hat) explained what the compiler > prefers to see, which is (obviously) the scalarized components > of each value. The three of them have worked out detailed > rules for calling between interpreted and compiled code. > > It seems to me that other implementations of the JVM (looking > at you, IBM) will tend in similar directions, so although our > results are strongly informed by our own prototyping, we think > it is likely that they will apply to other, independent JVM > implementations. (Or are there platforms where the interpreter > will scalarize aggressively and the optimizer will prefer to > keep everything in structs? Not.) > > Karen Kinnear and the Oracle Valhalla lead, David Simms, were > there to make sure we solved the important problems and asked > the hard questions. As a special appearance, one of our spec. > gurus, Dan Smith, was there to help us make rigorous sense out > of our intuitions and hacks. > > Since we were short on language experts, we just worked in > the mode (my personal favorite) of pretending that the JVM > is the most important thing, and the Java language designers > will just have to figure out how to use it. Of course, that's an > oversimplification; the JLS and JVMS inform each other very > strongly, but it was freeing to temporarily take current thoughts > about JLS extensions as a given and vary the JVM to find > the sweet spot that would be simple to implement and supportive > to what we think we know about the Valhalla Java of the future. > > We had some long conversations about carrier types: L, Q, U, > and more, and that's what I want to write about here. We also > make significant progress in the design of crackable lambdas, > template classes, and current and future versions of condy. > We talked to Ron Pressler about kick starting Loom fibers. > But it is L-types I want to talk about here; the above is just a > sketch of the past week's environment. > > Logically speaking, we have two things we want to do, and > that unfolds to a choice between three "worlds" of up to four > distinct kinds: L/Q/U/R. L is always present because it is > a legacy model for reference types. Q is always present > because we know we need (at least sometimes) to make > a syntactic distinction between flattened values and legacy > objects. > > (Why not just always look inside the classfile? Because > the verifier cannot be expected to load a class for every > type it sees, so needs a descriptor kind character from > time to time.) > > The U kind came a year or two ago when we realized > that any-generics (and/or templates) and interfaces both > require a disjoint-union type that is neither Q nor L, but > can keep track of Q payloads (value instances) and L > payloads (nullable references to object instances), > without mixing them up. In other words, neither Q-types > nor legacy L-types are parallel class-based constructs, > and neither conveniently "sits on top" of the other; they > need a common supertype to carry them without confusion. > > Before I describe the three logically possible "worlds", > I'll add one more letter, R. An R-type is exactly a legacy > L-type, a nullable reference. Why use a separate letter? > Answer: For the same reason we introduced the other > kind letters, to preserve all the necessary distinctions > among different kinds of payloads and carrier types, > and also to talk about the explicit encoding of descript
Re: abandon all U-types, welcome to L-world (or, what I learned in Burlington)
On Nov 19, 2017, at 2:47 PM, Remi Forax wrote: > > The claim is that Object is used more as the root of any types like in > collections than as the root of all references like in System.out.println(). Object and interfaces play the role of top types. One view is that we are making object act more like an interface. Also we don’t add any new carrier types to the interpreter.
Re: abandon all U-types, welcome to L-world (or, what I learned in Burlington)
- Mail original - > De: "John Rose" > À: "Remi Forax" > Cc: "valhalla-spec-experts" > Envoyé: Lundi 20 Novembre 2017 00:06:20 > Objet: Re: abandon all U-types, welcome to L-world (or, what I learned in > Burlington) > On Nov 19, 2017, at 2:47 PM, Remi Forax wrote: >> >> The claim is that Object is used more as the root of any types like in >> collections than as the root of all references like in System.out.println(). > > > Object and interfaces play the role of top types. One view is that we are > making > object act more like an interface. ah, yes, it makes the whole model far simpler. > > Also we don’t add any new carrier types to the interpreter. but you need a way disambiguate a reference type from a value type at runtime in the interpreter. You also nedd to teach JITs to propagate L vs which Q info on local variables for generics specialization (and it works even if the inlining fails because the boxing/wrapping in the thread local storage is done by the adapters so the JITed code doesn't have to be conservative). Rémi
Re: abandon all U-types, welcome to L-world (or, what I learned in Burlington)
On Nov 19, 2017, at 3:33 PM, fo...@univ-mlv.fr wrote: > > but you need a way disambiguate a reference type from a value type at runtime > in the interpreter. > You also nedd to teach JITs to propagate L vs which Q info on local variables > for generics specialization (and it works even if the inlining fails because > the boxing/wrapping in the thread local storage is done by the adapters so > the JITed code doesn't have to be conservative). Yes, that leads to the sort of tag bot schemes I alluded to. I didn’t want to go into detail because it is a spec. list not a dev. list. On the dev. list we will say more about it. The problems seem solvable especially after the MVT prototyping experience on HotSpot, and I hope the IBM experience corroborates.
Re: abandon all U-types, welcome to L-world (or, what I learned in Burlington)
What's the L-world story for array subtyping? For any R-type, R[] <: Object[]. If everything is an L type and everything is <: Object, are arrays of Q-types/primitives also subtypes of Object[]? We didn't have a story for this in QU-world either, but at least in QU-world it was believable that QFoo[] less tenable when there's no syntactic difference between L-uses and Q-uses. (And even less so when we might migrate code from L to Q.) On 11/19/2017 4:40 PM, John Rose wrote: We just had a 50-hour week of face-to-face meetings by the Valhalla VM team. We learned a lot and surprised ourselves by coming to a consensus that a promising design for value types uses mainly the same legacy L-type descriptors, makes relatively little use of Q-type descriptors, and does not appear to need a third descriptor "kind" or "mode", such as U for universal, or R for reference-only. First a few highlights out of many. Fred Parain explained to us how he has prototyped a thread-local analog of Java heaps to store value structs in a form convenient to the interpreter. Tobias Hartmann and Roland Westrelin (of Red Hat) explained what the compiler prefers to see, which is (obviously) the scalarized components of each value. The three of them have worked out detailed rules for calling between interpreted and compiled code. It seems to me that other implementations of the JVM (looking at you, IBM) will tend in similar directions, so although our results are strongly informed by our own prototyping, we think it is likely that they will apply to other, independent JVM implementations. (Or are there platforms where the interpreter will scalarize aggressively and the optimizer will prefer to keep everything in structs? Not.) Karen Kinnear and the Oracle Valhalla lead, David Simms, were there to make sure we solved the important problems and asked the hard questions. As a special appearance, one of our spec. gurus, Dan Smith, was there to help us make rigorous sense out of our intuitions and hacks. Since we were short on language experts, we just worked in the mode (my personal favorite) of pretending that the JVM is the most important thing, and the Java language designers will just have to figure out how to use it. Of course, that's an oversimplification; the JLS and JVMS inform each other very strongly, but it was freeing to temporarily take current thoughts about JLS extensions as a given and vary the JVM to find the sweet spot that would be simple to implement and supportive to what we think we know about the Valhalla Java of the future. We had some long conversations about carrier types: L, Q, U, and more, and that's what I want to write about here. We also make significant progress in the design of crackable lambdas, template classes, and current and future versions of condy. We talked to Ron Pressler about kick starting Loom fibers. But it is L-types I want to talk about here; the above is just a sketch of the past week's environment. Logically speaking, we have two things we want to do, and that unfolds to a choice between three "worlds" of up to four distinct kinds: L/Q/U/R. L is always present because it is a legacy model for reference types. Q is always present because we know we need (at least sometimes) to make a syntactic distinction between flattened values and legacy objects. (Why not just always look inside the classfile? Because the verifier cannot be expected to load a class for every type it sees, so needs a descriptor kind character from time to time.) The U kind came a year or two ago when we realized that any-generics (and/or templates) and interfaces both require a disjoint-union type that is neither Q nor L, but can keep track of Q payloads (value instances) and L payloads (nullable references to object instances), without mixing them up. In other words, neither Q-types nor legacy L-types are parallel class-based constructs, and neither conveniently "sits on top" of the other; they need a common supertype to carry them without confusion. Before I describe the three logically possible "worlds", I'll add one more letter, R. An R-type is exactly a legacy L-type, a nullable reference. Why use a separate letter? Answer: For the same reason we introduced the other kind letters, to preserve all the necessary distinctions among different kinds of payloads and carrier types, and also to talk about the explicit encoding of descriptors. There are three worlds we could design to hold both legacy R-types (today's L-types) and Q-types: U-world, L-world, and R-world. They might be notated respectively as U/QL, L/Q, and U/QR. The "U-world" is what I have been mentally preparing for for many months. It is the design where L-types, marked as such in bytecode type descriptors, are always legacy object references or null, and Q-types, also marked as such in bytecodes, are always new value types. To carry runtime payloads which may dynamically vary between the two modes, we need a third mode
Re: abandon all U-types, welcome to L-world (or, what I learned in Burlington)
On Nov 22, 2017, at 5:48 AM, Brian Goetz wrote: > > What's the L-world story for array subtyping? For any R-type, R[] <: > Object[]. If everything is an L type and everything is <: Object, are arrays > of Q-types/primitives also subtypes of Object[]? > > We didn't have a story for this in QU-world either, but at least in QU-world > it was believable that QFoo[] when there's no syntactic difference between L-uses and Q-uses. (And even > less so when we might migrate code from L to Q.) We were just talking about this in the concall with IBM. Field and array-element flattening are the two places where the Q/R distinction provides a crucial hint that flattening *may* (not always *must*) occur. This hint is crucial because it is statically visible *before* all class files are loaded. The instance layout algorithm and the verifier both need to run before all class files are loaded (because of circularities, also performance). In U-world, the letter 'Q' in a static descriptor tells the instance layout generator to load the field class and extract sub-layout. It also tells the verifier not to assume a covariantly compatible layout relative to the type Object[]. If we don't keep a few Q's around for old times' sake, we will need to signal these subtle difference some other way, with an ACC_FLATTENABLE bit on fields and a special "[@" syntax variant (pick your letter, maybe "Q") for array descriptors. I think of this as the "residual Q problem", of finding offices for the few remaining occurrences of Q that do real work in L-world. (The user-visible distinction of flat vs. legacy arrays was one influence that led us towards user-visible box types. I'd like to resist that this time around. Perhaps "[@" is a syntax that is mutually exclusive with plain "[". And there is a showdown when such an array is created, so that the descriptor has a "@" if and only if the loaded element type is in fact a Q-type. It's a move that of class loader constraints.) I am not opposed to allowing the existing Q-syntax for descriptors, in a limited number of places, to solve those residual problems. That thought leads me to try on the idea (which Remi discourages) that perhaps Q-narrowings of some method descriptors are useful (requiring bridges just like today's generics). A Q-narrowing of a class means: No nulls here, identity is not observable, and value-based semantics are in force, including unmodifiability. Not a bad set of guarantees; maybe that's a job for Q's rather than an invisible type profile. BTW, an Ljava/util/List; would not pass through a Qjava/util/List; descriptor, unless List.copyOf were applied to it first. The VM and java.base can conspire to provide a curated set of Q-able types with enforced value or value-based behavior. — John
Re: abandon all U-types, welcome to L-world (or, what I learned in Burlington)
> De: "John Rose" > À: "Brian Goetz" > Cc: "valhalla-spec-experts" > Envoyé: Mercredi 22 Novembre 2017 19:48:18 > Objet: Re: abandon all U-types, welcome to L-world (or, what I learned in > Burlington) > On Nov 22, 2017, at 5:48 AM, Brian Goetz < [ mailto:brian.go...@oracle.com | > brian.go...@oracle.com ] > wrote: >> What's the L-world story for array subtyping? For any R-type, R[] <: >> Object[]. >> If everything is an L type and everything is <: Object, are arrays of >> Q-types/primitives also subtypes of Object[]? >> We didn't have a story for this in QU-world either, but at least in QU-world >> it >> was believable that QFoo[] > there's no syntactic difference between L-uses and Q-uses. (And even less so >> when we might migrate code from L to Q.) > We were just talking about this in the concall with IBM. > Field and array-element flattening are the two places where > the Q/R distinction provides a crucial hint that flattening > *may* (not always *must*) occur. This hint is crucial because > it is statically visible *before* all class files are loaded. > The instance layout algorithm and the verifier both need > to run before all class files are loaded (because of > circularities, also performance). In U-world, the > letter 'Q' in a static descriptor tells the instance layout > generator to load the field class and extract sub-layout. > It also tells the verifier not to assume a covariantly > compatible layout relative to the type Object[]. > If we don't keep a few Q's around for old times' > sake, we will need to signal these subtle difference > some other way, with an ACC_FLATTENABLE > bit on fields and a special "[@" syntax variant > (pick your letter, maybe "Q") for array descriptors. > I think of this as the "residual Q problem", of > finding offices for the few remaining occurrences > of Q that do real work in L-world. > (The user-visible distinction of flat vs. legacy > arrays was one influence that led us towards > user-visible box types. I'd like to resist that this > time around. Perhaps "[@" is a syntax that is > mutually exclusive with plain "[". And there is > a showdown when such an array is created, > so that the descriptor has a "@" if and only if > the loaded element type is in fact a Q-type. > It's a move that of class loader constraints.) in that case, i prefer '{' instead of '@', the angle square bracket is the classical bracket and the curly brace is the fancy bracket. and with my ASM hat, introducing '{' in the ASM code is far easier than introducing 'Q'. > I am not opposed to allowing the existing Q-syntax > for descriptors, in a limited number of places, to > solve those residual problems. That thought leads > me to try on the idea (which Remi discourages) > that perhaps Q-narrowings of some method > descriptors are useful (requiring bridges just > like today's generics). A Q-narrowing of a > class means: No nulls here, identity is not > observable, and value-based semantics are > in force, including unmodifiability. Not a bad > set of guarantees; maybe that's a job for Q's > rather than an invisible type profile. for the record, i'm against because unlike generics where you have to introduce bridge when you actually use generics, here you have to introduce bridges in an already existing code. > BTW, an Ljava/util/List; would not pass through > a Qjava/util/List; descriptor, unless List.copyOf > were applied to it first. The VM and java.base > can conspire to provide a curated set of Q-able > types with enforced value or value-based behavior. > — John Rémi
Re: abandon all U-types, welcome to L-world (or, what I learned in Burlington)
> On Nov 22, 2017, at 6:48 AM, Brian Goetz wrote: > > What's the L-world story for array subtyping? For any R-type, R[] <: > Object[]. If everything is an L type and everything is <: Object, are arrays > of Q-types/primitives also subtypes of Object[]? > > We didn't have a story for this in QU-world either, but at least in QU-world > it was believable that QFoo[] when there's no syntactic difference between L-uses and Q-uses. (And even > less so when we might migrate code from L to Q.) My two cents: we didn't discuss this in depth, but John raised it in this thread, and in the "design notes" document, I followed up with some details. - Initially, QFoo[] is not a subtype of LFoo[]. You want covariant subtyping, you need to stick with L types. - As an enhancement, we can introduce covariant Q-L subtyping, adjust the behavior of aaload/aastore, and explore the performance impact. What's hard about treating QFoo[] as an LObject[] is that the layout is a dynamic property, requiring dynamic checks. But we may also be interested in pursuing non-uniform layout for QFoo[] (a specific idea: flattening generally but not for "volatile" instances), so there may be some satisfactory coping techniques coming. On Java syntax: who says there's no syntactic difference between L-uses and Q-uses? You might spell the L-use "Complex?". But, anyway, that's a question to raise in a year, after we better understand the JVM. —Dan