abandon all U-types, welcome to L-world (or, what I learned in Burlington)

2017-11-19 Thread John Rose
We just had a 50-hour week of face-to-face meetings by the
Valhalla VM team.  We learned a lot and surprised ourselves
by coming to a consensus that a promising design for value
types uses mainly the same legacy L-type descriptors, makes
relatively little use of Q-type descriptors, and does not appear
to need a third descriptor "kind" or "mode", such as U for
universal, or R for reference-only.

First a few highlights out of many.  Fred Parain explained to us how
he has prototyped a thread-local analog of Java heaps to store value
structs in a form convenient to the interpreter.  Tobias Hartmann
and Roland Westrelin (of Red Hat) explained what the compiler
prefers to see, which is (obviously) the scalarized components
of each value.  The three of them have worked out detailed
rules for calling between interpreted and compiled code.

It seems to me that other implementations of the JVM (looking
at you, IBM) will tend in similar directions, so although our
results are strongly informed by our own prototyping, we think
it is likely that they will apply to other, independent JVM
implementations.  (Or are there platforms where the interpreter
will scalarize aggressively and the optimizer will prefer to
keep everything in structs?  Not.)

Karen Kinnear and the Oracle Valhalla lead, David Simms, were
there to make sure we solved the important problems and asked
the hard questions.  As a special appearance, one of our spec.
gurus, Dan Smith, was there to help us make rigorous sense out
of our intuitions and hacks.

Since we were short on language experts, we just worked in
the mode (my personal favorite) of pretending that the JVM
is the most important thing, and the Java language designers
will just have to figure out how to use it.  Of course, that's an
oversimplification; the JLS and JVMS inform each other very
strongly, but it was freeing to temporarily take current thoughts
about JLS extensions as a given and vary the JVM to find
the sweet spot that would be simple to implement and supportive
to what we think we know about the Valhalla Java of the future.

We had some long conversations about carrier types: L, Q, U,
and more, and that's what I want to write about here.  We also
make significant progress in the design of crackable lambdas,
template classes, and current and future versions of condy.
We talked to Ron Pressler about kick starting Loom fibers.
But it is L-types I want to talk about here; the above is just a
sketch of the past week's environment.

Logically speaking, we have two things we want to do, and
that unfolds to a choice between three "worlds" of up to four
distinct kinds: L/Q/U/R.  L is always present because it is
a legacy model for reference types.  Q is always present
because we know we need (at least sometimes) to make
a syntactic distinction between flattened values and legacy
objects.

(Why not just always look inside the classfile? Because
the verifier cannot be expected to load a class for every
type it sees, so needs a descriptor kind character from
time to time.)

The U kind came a year or two ago when we realized
that any-generics (and/or templates) and interfaces both
require a disjoint-union type that is neither Q nor L, but
can keep track of Q payloads (value instances) and L
payloads (nullable references to object instances),
without mixing them up.  In other words, neither Q-types
nor legacy L-types are parallel class-based constructs,
and neither conveniently "sits on top" of the other; they
need a common supertype to carry them without confusion.

Before I describe the three logically possible "worlds",
I'll add one more letter, R.  An R-type is exactly a legacy
L-type, a nullable reference.  Why use a separate letter?
Answer:  For the same reason we introduced the other
kind letters, to preserve all the necessary distinctions
among different kinds of payloads and carrier types,
and also to talk about the explicit encoding of descriptors.

There are three worlds we could design to hold both legacy
R-types (today's L-types) and Q-types:  U-world, L-world,
and R-world.  They might be notated respectively as U/QL,
L/Q, and U/QR.

The "U-world" is what I have been mentally preparing for
for many months.  It is the design where L-types, marked
as such in bytecode type descriptors, are always legacy
object references or null, and Q-types, also marked as
such in bytecodes, are always new value types.  To
carry runtime payloads which may dynamically vary
between the two modes, we need a third mode, U-types,
which carry the two kinds of payloads (I hesitate to say
"values" because I want to include reference values also).

A U-type is a disjoint union between corresponding,
similarly named Q-types and L-types.

(Mathematically, a _disjoint union_ of C = A |_| B is no more
and no less than the sum of all elements or points comprised
by the two constituent sets A and B.  The disjoint union has
nothing more: no points not in A or B.  It has nothing less:
every point of C is from eith

Re: abandon all U-types, welcome to L-world (or, what I learned in Burlington)

2017-11-19 Thread Remi Forax
To summarize for myself,
we already know that we only need only one U, java.lang.__Value, let try to 
make it java.lang.Object (with no boxing).

The claim is that Object is used more as the root of any types like in 
collections than as the root of all references like in System.out.println().

Ok, i need to think more about that.

regards,
Rémi

- Mail original -
> De: "John Rose" 
> À: "valhalla-spec-experts" 
> Envoyé: Dimanche 19 Novembre 2017 22:40:33
> Objet: abandon all U-types, welcome to L-world (or, what I learned in 
> Burlington)

> We just had a 50-hour week of face-to-face meetings by the
> Valhalla VM team.  We learned a lot and surprised ourselves
> by coming to a consensus that a promising design for value
> types uses mainly the same legacy L-type descriptors, makes
> relatively little use of Q-type descriptors, and does not appear
> to need a third descriptor "kind" or "mode", such as U for
> universal, or R for reference-only.
> 
> First a few highlights out of many.  Fred Parain explained to us how
> he has prototyped a thread-local analog of Java heaps to store value
> structs in a form convenient to the interpreter.  Tobias Hartmann
> and Roland Westrelin (of Red Hat) explained what the compiler
> prefers to see, which is (obviously) the scalarized components
> of each value.  The three of them have worked out detailed
> rules for calling between interpreted and compiled code.
> 
> It seems to me that other implementations of the JVM (looking
> at you, IBM) will tend in similar directions, so although our
> results are strongly informed by our own prototyping, we think
> it is likely that they will apply to other, independent JVM
> implementations.  (Or are there platforms where the interpreter
> will scalarize aggressively and the optimizer will prefer to
> keep everything in structs?  Not.)
> 
> Karen Kinnear and the Oracle Valhalla lead, David Simms, were
> there to make sure we solved the important problems and asked
> the hard questions.  As a special appearance, one of our spec.
> gurus, Dan Smith, was there to help us make rigorous sense out
> of our intuitions and hacks.
> 
> Since we were short on language experts, we just worked in
> the mode (my personal favorite) of pretending that the JVM
> is the most important thing, and the Java language designers
> will just have to figure out how to use it.  Of course, that's an
> oversimplification; the JLS and JVMS inform each other very
> strongly, but it was freeing to temporarily take current thoughts
> about JLS extensions as a given and vary the JVM to find
> the sweet spot that would be simple to implement and supportive
> to what we think we know about the Valhalla Java of the future.
> 
> We had some long conversations about carrier types: L, Q, U,
> and more, and that's what I want to write about here.  We also
> make significant progress in the design of crackable lambdas,
> template classes, and current and future versions of condy.
> We talked to Ron Pressler about kick starting Loom fibers.
> But it is L-types I want to talk about here; the above is just a
> sketch of the past week's environment.
> 
> Logically speaking, we have two things we want to do, and
> that unfolds to a choice between three "worlds" of up to four
> distinct kinds: L/Q/U/R.  L is always present because it is
> a legacy model for reference types.  Q is always present
> because we know we need (at least sometimes) to make
> a syntactic distinction between flattened values and legacy
> objects.
> 
> (Why not just always look inside the classfile? Because
> the verifier cannot be expected to load a class for every
> type it sees, so needs a descriptor kind character from
> time to time.)
> 
> The U kind came a year or two ago when we realized
> that any-generics (and/or templates) and interfaces both
> require a disjoint-union type that is neither Q nor L, but
> can keep track of Q payloads (value instances) and L
> payloads (nullable references to object instances),
> without mixing them up.  In other words, neither Q-types
> nor legacy L-types are parallel class-based constructs,
> and neither conveniently "sits on top" of the other; they
> need a common supertype to carry them without confusion.
> 
> Before I describe the three logically possible "worlds",
> I'll add one more letter, R.  An R-type is exactly a legacy
> L-type, a nullable reference.  Why use a separate letter?
> Answer:  For the same reason we introduced the other
> kind letters, to preserve all the necessary distinctions
> among different kinds of payloads and carrier types,
> and also to talk about the explicit encoding of descript

Re: abandon all U-types, welcome to L-world (or, what I learned in Burlington)

2017-11-19 Thread John Rose
On Nov 19, 2017, at 2:47 PM, Remi Forax  wrote:
> 
> The claim is that Object is used more as the root of any types like in 
> collections than as the root of all references like in System.out.println().


Object and interfaces play the role of top types. One view is that we are 
making object act more like an interface. 

Also we don’t add any new carrier types to the interpreter. 


Re: abandon all U-types, welcome to L-world (or, what I learned in Burlington)

2017-11-19 Thread forax
- Mail original -
> De: "John Rose" 
> À: "Remi Forax" 
> Cc: "valhalla-spec-experts" 
> Envoyé: Lundi 20 Novembre 2017 00:06:20
> Objet: Re: abandon all U-types, welcome to L-world (or, what I learned in 
> Burlington)

> On Nov 19, 2017, at 2:47 PM, Remi Forax  wrote:
>> 
>> The claim is that Object is used more as the root of any types like in
>> collections than as the root of all references like in System.out.println().
> 
> 
> Object and interfaces play the role of top types. One view is that we are 
> making
> object act more like an interface.

ah, yes,
it makes the whole model far simpler.

> 
> Also we don’t add any new carrier types to the interpreter.

but you need a way disambiguate a reference type from a value type at runtime 
in the interpreter.
You also nedd to teach JITs to propagate L vs which Q info on local variables 
for generics specialization (and it works even if the inlining fails because 
the boxing/wrapping in the thread local storage is done by the adapters so the 
JITed code doesn't have to be conservative).

Rémi


Re: abandon all U-types, welcome to L-world (or, what I learned in Burlington)

2017-11-19 Thread John Rose
On Nov 19, 2017, at 3:33 PM, fo...@univ-mlv.fr wrote:
> 
> but you need a way disambiguate a reference type from a value type at runtime 
> in the interpreter.
> You also nedd to teach JITs to propagate L vs which Q info on local variables 
> for generics specialization (and it works even if the inlining fails because 
> the boxing/wrapping in the thread local storage is done by the adapters so 
> the JITed code doesn't have to be conservative).

Yes, that leads to the sort of tag bot schemes I alluded to. I didn’t want to 
go into detail because it is a spec. list not a dev. list. On the dev. list we 
will say more about it. The problems seem solvable especially after the MVT 
prototyping experience on HotSpot, and I hope the IBM experience corroborates. 


Re: abandon all U-types, welcome to L-world (or, what I learned in Burlington)

2017-11-22 Thread Brian Goetz
What's the L-world story for array subtyping?  For any R-type, R[] <: 
Object[].  If everything is an L type and everything is <: Object, are 
arrays of Q-types/primitives also subtypes of Object[]?


We didn't have a story for this in QU-world either, but at least in 
QU-world it was believable that QFoo[] less tenable when there's no syntactic difference between L-uses and 
Q-uses.  (And even less so when we might migrate code from L to Q.)




On 11/19/2017 4:40 PM, John Rose wrote:

We just had a 50-hour week of face-to-face meetings by the
Valhalla VM team.  We learned a lot and surprised ourselves
by coming to a consensus that a promising design for value
types uses mainly the same legacy L-type descriptors, makes
relatively little use of Q-type descriptors, and does not appear
to need a third descriptor "kind" or "mode", such as U for
universal, or R for reference-only.

First a few highlights out of many.  Fred Parain explained to us how
he has prototyped a thread-local analog of Java heaps to store value
structs in a form convenient to the interpreter.  Tobias Hartmann
and Roland Westrelin (of Red Hat) explained what the compiler
prefers to see, which is (obviously) the scalarized components
of each value.  The three of them have worked out detailed
rules for calling between interpreted and compiled code.

It seems to me that other implementations of the JVM (looking
at you, IBM) will tend in similar directions, so although our
results are strongly informed by our own prototyping, we think
it is likely that they will apply to other, independent JVM
implementations.  (Or are there platforms where the interpreter
will scalarize aggressively and the optimizer will prefer to
keep everything in structs?  Not.)

Karen Kinnear and the Oracle Valhalla lead, David Simms, were
there to make sure we solved the important problems and asked
the hard questions.  As a special appearance, one of our spec.
gurus, Dan Smith, was there to help us make rigorous sense out
of our intuitions and hacks.

Since we were short on language experts, we just worked in
the mode (my personal favorite) of pretending that the JVM
is the most important thing, and the Java language designers
will just have to figure out how to use it.  Of course, that's an
oversimplification; the JLS and JVMS inform each other very
strongly, but it was freeing to temporarily take current thoughts
about JLS extensions as a given and vary the JVM to find
the sweet spot that would be simple to implement and supportive
to what we think we know about the Valhalla Java of the future.

We had some long conversations about carrier types: L, Q, U,
and more, and that's what I want to write about here.  We also
make significant progress in the design of crackable lambdas,
template classes, and current and future versions of condy.
We talked to Ron Pressler about kick starting Loom fibers.
But it is L-types I want to talk about here; the above is just a
sketch of the past week's environment.

Logically speaking, we have two things we want to do, and
that unfolds to a choice between three "worlds" of up to four
distinct kinds: L/Q/U/R.  L is always present because it is
a legacy model for reference types.  Q is always present
because we know we need (at least sometimes) to make
a syntactic distinction between flattened values and legacy
objects.

(Why not just always look inside the classfile? Because
the verifier cannot be expected to load a class for every
type it sees, so needs a descriptor kind character from
time to time.)

The U kind came a year or two ago when we realized
that any-generics (and/or templates) and interfaces both
require a disjoint-union type that is neither Q nor L, but
can keep track of Q payloads (value instances) and L
payloads (nullable references to object instances),
without mixing them up.  In other words, neither Q-types
nor legacy L-types are parallel class-based constructs,
and neither conveniently "sits on top" of the other; they
need a common supertype to carry them without confusion.

Before I describe the three logically possible "worlds",
I'll add one more letter, R.  An R-type is exactly a legacy
L-type, a nullable reference.  Why use a separate letter?
Answer:  For the same reason we introduced the other
kind letters, to preserve all the necessary distinctions
among different kinds of payloads and carrier types,
and also to talk about the explicit encoding of descriptors.

There are three worlds we could design to hold both legacy
R-types (today's L-types) and Q-types:  U-world, L-world,
and R-world.  They might be notated respectively as U/QL,
L/Q, and U/QR.

The "U-world" is what I have been mentally preparing for
for many months.  It is the design where L-types, marked
as such in bytecode type descriptors, are always legacy
object references or null, and Q-types, also marked as
such in bytecodes, are always new value types.  To
carry runtime payloads which may dynamically vary
between the two modes, we need a third mode

Re: abandon all U-types, welcome to L-world (or, what I learned in Burlington)

2017-11-22 Thread John Rose
On Nov 22, 2017, at 5:48 AM, Brian Goetz  wrote:
> 
> What's the L-world story for array subtyping?  For any R-type, R[] <: 
> Object[].  If everything is an L type and everything is <: Object, are arrays 
> of Q-types/primitives also subtypes of Object[]?
> 
> We didn't have a story for this in QU-world either, but at least in QU-world 
> it was believable that QFoo[]  when there's no syntactic difference between L-uses and Q-uses.  (And even 
> less so when we might migrate code from L to Q.)

We were just talking about this in the concall with IBM.

Field and array-element flattening are the two places where
the Q/R distinction provides a crucial hint that flattening
*may* (not always *must*) occur.  This hint is crucial because
it is statically visible *before* all class files are loaded.

The instance layout algorithm and the verifier both need
to run before all class files are loaded (because of
circularities, also performance).  In U-world, the
letter 'Q' in a static descriptor tells the instance layout
generator to load the field class and extract sub-layout.
It also tells the verifier not to assume a covariantly
compatible layout relative to the type Object[].

If we don't keep a few Q's around for old times'
sake, we will need to signal these subtle difference
some other way, with an ACC_FLATTENABLE
bit on fields and a special "[@" syntax variant
(pick your letter, maybe "Q") for array descriptors.

I think of this as the "residual Q problem", of
finding offices for the few remaining occurrences
of Q that do real work in L-world.

(The user-visible distinction of flat vs. legacy
arrays was one influence that led us towards
user-visible box types.  I'd like to resist that this
time around.  Perhaps "[@" is a syntax that is
mutually exclusive with plain "[".  And there is
a showdown when such an array is created,
so that the descriptor has a "@" if and only if
the loaded element type is in fact a Q-type.
It's a move that of class loader constraints.)

I am not opposed to allowing the existing Q-syntax
for descriptors, in a limited number of places, to
solve those residual problems.  That thought leads
me to try on the idea (which Remi discourages)
that perhaps Q-narrowings of some method
descriptors are useful (requiring bridges just
like today's generics).  A Q-narrowing of a
class means:  No nulls here, identity is not
observable, and value-based semantics are
in force, including unmodifiability.  Not a bad
set of guarantees; maybe that's a job for Q's
rather than an invisible type profile.

BTW, an Ljava/util/List; would not pass through
a Qjava/util/List; descriptor, unless List.copyOf
were applied to it first.  The VM and java.base
can conspire to provide a curated set of Q-able
types with enforced value or value-based behavior.

— John

Re: abandon all U-types, welcome to L-world (or, what I learned in Burlington)

2017-11-22 Thread Remi Forax
> De: "John Rose" 
> À: "Brian Goetz" 
> Cc: "valhalla-spec-experts" 
> Envoyé: Mercredi 22 Novembre 2017 19:48:18
> Objet: Re: abandon all U-types, welcome to L-world (or, what I learned in
> Burlington)

> On Nov 22, 2017, at 5:48 AM, Brian Goetz < [ mailto:brian.go...@oracle.com |
> brian.go...@oracle.com ] > wrote:

>> What's the L-world story for array subtyping? For any R-type, R[] <: 
>> Object[].
>> If everything is an L type and everything is <: Object, are arrays of
>> Q-types/primitives also subtypes of Object[]?

>> We didn't have a story for this in QU-world either, but at least in QU-world 
>> it
>> was believable that QFoo[] > there's no syntactic difference between L-uses and Q-uses. (And even less so
>> when we might migrate code from L to Q.)

> We were just talking about this in the concall with IBM.

> Field and array-element flattening are the two places where
> the Q/R distinction provides a crucial hint that flattening
> *may* (not always *must*) occur. This hint is crucial because
> it is statically visible *before* all class files are loaded.

> The instance layout algorithm and the verifier both need
> to run before all class files are loaded (because of
> circularities, also performance). In U-world, the
> letter 'Q' in a static descriptor tells the instance layout
> generator to load the field class and extract sub-layout.
> It also tells the verifier not to assume a covariantly
> compatible layout relative to the type Object[].

> If we don't keep a few Q's around for old times'
> sake, we will need to signal these subtle difference
> some other way, with an ACC_FLATTENABLE
> bit on fields and a special "[@" syntax variant
> (pick your letter, maybe "Q") for array descriptors.

> I think of this as the "residual Q problem", of
> finding offices for the few remaining occurrences
> of Q that do real work in L-world.

> (The user-visible distinction of flat vs. legacy
> arrays was one influence that led us towards
> user-visible box types. I'd like to resist that this
> time around. Perhaps "[@" is a syntax that is
> mutually exclusive with plain "[". And there is
> a showdown when such an array is created,
> so that the descriptor has a "@" if and only if
> the loaded element type is in fact a Q-type.
> It's a move that of class loader constraints.)

 

in that case, i prefer '{' instead of '@', the angle square bracket is the 
classical bracket and the curly brace is the fancy bracket. 

 

and with my ASM hat, introducing '{' in the ASM code is far easier than 
introducing 'Q'. 

> I am not opposed to allowing the existing Q-syntax
> for descriptors, in a limited number of places, to
> solve those residual problems. That thought leads
> me to try on the idea (which Remi discourages)
> that perhaps Q-narrowings of some method
> descriptors are useful (requiring bridges just
> like today's generics). A Q-narrowing of a
> class means: No nulls here, identity is not
> observable, and value-based semantics are
> in force, including unmodifiability. Not a bad
> set of guarantees; maybe that's a job for Q's
> rather than an invisible type profile.

for the record, i'm against because unlike generics where you have to introduce 
bridge when you actually use generics, here you have to introduce bridges in an 
already existing code. 

> BTW, an Ljava/util/List; would not pass through
> a Qjava/util/List; descriptor, unless List.copyOf
> were applied to it first. The VM and java.base
> can conspire to provide a curated set of Q-able
> types with enforced value or value-based behavior.

> — John

Rémi 


Re: abandon all U-types, welcome to L-world (or, what I learned in Burlington)

2017-11-22 Thread Dan Smith
> On Nov 22, 2017, at 6:48 AM, Brian Goetz  wrote:
> 
> What's the L-world story for array subtyping?  For any R-type, R[] <: 
> Object[].  If everything is an L type and everything is <: Object, are arrays 
> of Q-types/primitives also subtypes of Object[]?
> 
> We didn't have a story for this in QU-world either, but at least in QU-world 
> it was believable that QFoo[]  when there's no syntactic difference between L-uses and Q-uses.  (And even 
> less so when we might migrate code from L to Q.)

My two cents: we didn't discuss this in depth, but John raised it in this 
thread, and in the "design notes" document, I followed up with some details.
- Initially, QFoo[] is not a subtype of LFoo[]. You want covariant subtyping, 
you need to stick with L types.
- As an enhancement, we can introduce covariant Q-L subtyping, adjust the 
behavior of aaload/aastore, and explore the performance impact.

What's hard about treating QFoo[] as an LObject[] is that the layout is a 
dynamic property, requiring dynamic checks. But we may also be interested in 
pursuing non-uniform layout for QFoo[] (a specific idea: flattening generally 
but not for "volatile" instances), so there may be some satisfactory coping 
techniques coming.

On Java syntax: who says there's no syntactic difference between L-uses and 
Q-uses? You might spell the L-use "Complex?". But, anyway, that's a question to 
raise in a year, after we better understand the JVM.

—Dan