from:"Dan Smith"

Re: Anonymous value classes

2022-06-06 Thread Dan Smith

> On Jun 4, 2022, at 3:33 AM, fo...@univ-mlv.fr wrote:
> 
> there is a lot of libraries that have APIs using interfaces that are 
> implemented by anonymous classes, the collection API is one of them, fluent 
> loggers (anything fluent in fact) is another, and those will benefit to have 
> better than escape analysis performance.

This could use validation. My very high-level sense is that within inlined 
code, escape analysis will do just fine with identity classes, with no 
observable performance gain when switching to a value class. *Across calls*, we 
can do much better with value classes, but at that point current HotSpot 
optimizations need a name in the descriptor. (Huge caveat that my understanding 
of this situation is very high-level, and there may be important things I'm 
missing.)

Also note that if it's necessary to opt in anyway, it's not particularly much 
to ask these performance-sensitive users to declare a local class rather than 
an anonymous value class.

Re: Anonymous value classes

2022-06-03 Thread Dan Smith

> On Jun 3, 2022, at 10:15 AM, Dan Smith  wrote:
> 
> Our javac prototype has long included support for a 'value' keyword after 
> 'new' to indicate that an anonymous class is a value class

(I see Remi brought this up in the list in July 2018, which is probably what 
inspired the prototype implementation. There wasn't really any followup 
discussion.)

Anonymous value classes

2022-06-03 Thread Dan Smith

Our javac prototype has long included support for a 'value' keyword after 'new' 
to indicate that an anonymous class is a value class:

Runnable r = new value Runnable() {
public void run() { x.foo(); }
};

Is this something we'd like to preserve as a language feature?

Arguments for:

- Allows the semantics of "I don't need identity" to be conveyed (often true 
for anonymous classes).

- Gives the JVM more information for optimization. If we don't need a heap 
object, evaluating the expression may be a no-op.

Arguments against:

- Opens a Pandora's box of syntax: what other keywords can go there? 
'identity'? 'primitive'? 'static'? 'record'?

- Because there's no named type, there are significantly fewer opportunities 
for optimization—you're probably going to end up with a heap object anyway.

- Value classes are primarily focused on simple data-carrying use cases, but 
any data being carried by an anonymous class is usually incidental. A new 
language feature would draw a lot of attention to this out-of-the-mainstream 
use case.

- In the simplest cases, you can use a lambda instead, and there the API 
implementation has freedom to implement lambdas with value classes if it turns 
out to be useful.

- The workaround—declare a local class instead—is reasonably straightforward 
for the scenarios where there's a real performance bottleneck that 'value' can 
help with.

EG meeting canceled, 2022-06-01

2022-06-01 Thread Dan Smith

Not much recent traffic, let's cancel today.

Kevin had some comments about == and pre-migration warnings that are worth your 
attention if you haven't reviewed that thread...

Re: Spec change documents for Value Objects

2022-05-19 Thread Dan Smith

On Apr 27, 2022, at 5:01 PM, Dan Smith 
mailto:daniel.sm...@oracle.com>> wrote:

Please see these two spec change documents for JLS and JVMS changes in support 
of the Value Objects feature.

Here's a revision, including some additional language checks that I missed in 
the first iteration.

http://cr.openjdk.java.net/~dlsmith/jep8277163/jep8277163-20220519/specs/value-objects-jls.html
http://cr.openjdk.java.net/~dlsmith/jep8277163/jep8277163-20220519/specs/value-objects-jvms.html

--

Diff of the changes:

diff --git 
a/closed/src/java.se/share/specs/value-objects-jls.md<http://java.se/share/specs/value-objects-jls.md>
 
b/closed/src/java.se/share/specs/value-objects-jls.md<http://java.se/share/specs/value-objects-jls.md>
index 3e8e44aa2c..392242efb9 100644
--- 
a/closed/src/java.se/share/specs/value-objects-jls.md<http://java.se/share/specs/value-objects-jls.md>
+++ 
b/closed/src/java.se/share/specs/value-objects-jls.md<http://java.se/share/specs/value-objects-jls.md>
@@ -501,9 +501,9 @@ It is permitted for the class declaration to redundantly 
specify the `final`
 modifier.

 The `identity` and `value` modifiers limit the set of classes that can extend
-an `abstract` class ([8.1.4]).
+a non-`final` class ([8.1.4]).

-Special restrictions apply to the field declarations ([8.3.1.2]), method
+Special restrictions apply to the field declarations ([8.3.1]), method
 declarations ([8.4.3.6]), and constructors ([8.8.7]) of a class that is not an
 `identity` class.

@@ -524,6 +524,61 @@ Should there be?



+ 8.1.3 Inner Classes and Enclosing Instances {#jls-8.1.3}
+
+...
+
+An inner class *C* is a *direct inner class of a class or interface O* if *O* 
is
+the immediately enclosing class or interface declaration of *C* and the
+declaration of *C* does not occur in a static context.
+
+> If an inner class is a local class or an anonymous class, it may be declared
+> in a static context, and in that case is not considered an inner class of any
+> enclosing class or interface.
+
+A class *C* is an *inner class of class or interface O* if it is either a 
direct
+inner class of *O* or an inner class of an inner class of *O*.
+
+> It is unusual, but possible, for the immediately enclosing class or interface
+> declaration of an inner class to be an interface.
+> This only occurs if the class is a local or anonymous class declared in a
+> `default` or `static` method body ([9.4]).
+
+A class or interface *O* is the *zeroth lexically enclosing class or interface
+declaration of itself*.
+
+A class *O* is the *n'th lexically enclosing class declaration of a class C* if
+it is the immediately enclosing class declaration of the *n-1*'th lexically
+enclosing class declaration of *C*.
+
+An instance *i* of a direct inner class *C* of a class or interface *O* is
+associated with an instance of *O*, known as the *immediately enclosing 
instance
+of i*.
+The immediately enclosing instance of an object, if any, is determined when the
+object is created ([15.9.2]).
+
+An object *o* is the *zeroth lexically enclosing instance of itself*.
+
+An object *o* is the *n'th lexically enclosing instance of an instance i* if it
+is the immediately enclosing instance of the *n-1*'th lexically enclosing
+instance of *i*.
+
+An instance of an inner local class or an anonymous class whose declaration
+occurs in a static context has no immediately enclosing instance.
+Also, an instance of a `static` nested class ([8.1.1.4]) has no immediately
+enclosing instance.
+
+**It is a compile-time error if an inner class has an immediately enclosing
+instance but is declared an `abstract` `value` class ([8.1.1.1], [8.1.1.5]).**
+
+> **If an abstract class is declared with neither the `value` nor the 
`identity`
+> modifier, but it is an inner class and has an immediately enclosing instance,
+> it is implicitly an `identity` class, per [8.1.1.5].**
+
+...
+
+
+
  8.1.4 Superclasses and Subclasses {#jls-8.1.4}

 The optional `extends` clause in a normal class declaration specifies the
@@ -761,8 +816,110 @@ instance method.**



+### 8.6 Instance Initializers {#jls-8.6}
+
+An *instance initializer* declared in a class is executed when an instance of
+the class is created ([12.5], [15.9], [8.8.7.1]).
+
+*InstanceInitializer:*
+: *Block*
+
+**It is a compile-time error for an `abstract` `value` class to declare an
+instance initializer.**
+
+> **If an abstract class is declared with neither the `value` nor the 
`identity`
+> modifier, but it declares an instance initializer, it is implicitly an
+> `identity` class, per [8.1.1.5].**
+
+It is a compile-time error if an instance initializer cannot complete normally
+([14.22]).
+
+It is a compile-time error if a `return` statement ([14.17]) appears anywhere
+within an instance initializer.
+
+An instance initializer is permitted to refer to the current object using the
+keyword `this` ([15.8.3]) or the

Re: EG meeting, 2022-05-18

2022-05-18 Thread Dan Smith


> On May 18, 2022, at 8:24 AM, Dan Smith  wrote:
> 
> EG Zoom meeting today at 4pm UTC (9am PDT, 12pm EDT).
> 
> Recent threads to discuss:
> 
> - "User model stacking: current status": Brian talked about factoring 
> atomicity out of the B2/B3 choice, as an extra choice applying to B3 (and 
> perhaps B2, too)
> 
> - "Nullity (was: User model stacking: current status)": Brian explored the 
> possibility of using '?' and '!' as alternatives to '.ref' and '.val' for B3 
> classes, anticipating more general support in the language for null-free types
> 
> - "User model: terminology": Brian summarized the different features that 
> need labels (non-identity classes, non-identity classes with a valid zero, 
> tearable classes, types with and without null)

Summary of this discussion:

Reviewed how we ended up with concerns about the status quo approach to 
primitive classes (documented in JEP 401), how we wanted a better story for 
tearing, and different strategies that have been considered there. Nothing new 
here, just summarizing.

Dug into some details of the nullable+tearable combination:

- A tearable B2 class is probably a mismatch—if you can tear, you can create a 
zero value, but the B2 has declared itself zero-hostile. No objections, then, 
to the idea that atomic/non-atomic is a property of B3 only (or equivalently, 
by giving up atomicity you've entered a new category, B4).

- Tearable+nullable B3 types (e.g., 'LPoint;' could be considered tearable) 
remain a possible area to explore. There's some concern about user 
model—tearing a null leads to surprising outcomes after a null check and 
possible hard-to-observe memory leaks—and implementation. It would help to 
ground this conversation in some more concrete examples, though.

EG meeting, 2022-05-18

2022-05-18 Thread Dan Smith

EG Zoom meeting today at 4pm UTC (9am PDT, 12pm EDT).

Recent threads to discuss:

- "User model stacking: current status": Brian talked about factoring atomicity 
out of the B2/B3 choice, as an extra choice applying to B3 (and perhaps B2, too)

- "Nullity (was: User model stacking: current status)": Brian explored the 
possibility of using '?' and '!' as alternatives to '.ref' and '.val' for B3 
classes, anticipating more general support in the language for null-free types

- "User model: terminology": Brian summarized the different features that need 
labels (non-identity classes, non-identity classes with a valid zero, tearable 
classes, types with and without null)

Re: Nullity (was: User model stacking: current status)

2022-05-12 Thread Dan Smith

> On May 11, 2022, at 7:45 PM, Kevin Bourrillion  wrote:
> 
> * `String!` indicates "an actual string" (I don't like to say "a non-null 
> string" because *null is not a string!*)

The thread talks around this later, but... what do I get initially if I declare 
a field/array component of type 'String!'?

I think in most approaches this would end up being a warning, with the 
field/array erased to LString and storing a null. (Alternatively, we build 
'String!' into the JVM, and I think that has to come with "uninitialized" 
detection on reads. We talked through that strategy quite a bit in the context 
of B2 before settling on "just use 'null'".)

So this is potentially a fundamental difference between String! and Point!: 
'new String![5]' and 'new Point![5]' give you very different arrays.

> * Exclamation fatigue would be very real, so assume there is some way to make 
> `!` the default for some scope

+1

Yes, I think it's a dead end to expect users to sprinkle '!' everywhere they 
don't want nulls—this is usually the informal default in common programming 
practice, so we need some way to enable flipping the default.

Lesson for B3: if B3! is primarily meant to be interpreted as a null-free type, 
people will naturally want to use that null-free type everywhere, and will want 
it to be default. (Reference default makes more sense where you generally want 
to use the nullable type, and only occasionally will opt in to the value type, 
probably for reasons other than whether 'null' is semantically meaningful.)

Also, a danger for B3 is that a rather casual flipping of defaults doesn't just 
affect compiler behavior—it changes the initial value and possibly atomicity of 
a field/array. So a little more scary for a random switch somewhere to change 
all your 'Point' usages from ref-default to val-default.

Re: User model stacking: current status

2022-05-09 Thread Dan Smith

> On May 9, 2022, at 10:10 AM, Kevin Bourrillion  wrote:
> 
>>> But seriously, we won't get away with pretending there are just 3 buckets 
>>> if we do this. Let's be honest and call it B4.
>> "Bucket" is a term that makes sense in language design, but need not flow 
>> into the user model.  But yes, there really are three things that the user 
>> needs control over: identity, zero-friendliness, atomicity.  If you want to 
>> call that four buckets, I won't argue.
>> 
> I *am* of course only caring about the user model, and that's where I'm 
> saying we would not get away with pretending this isn't a 4th kind of 
> concrete class.

Here's a presentation that doesn't feel to me like it's describing a menu with 
four choices:

In Java, there are object references and there are primitives. For which kinds 
of values are you trying to declare a class?

If object references: okay, do your objects need identity or not?

If primitives: okay, do your primitives need atomicity or not?

Re: User model stacking: current status

2022-05-06 Thread Dan Smith

> On May 6, 2022, at 8:04 AM, Brian Goetz  wrote:
> 
> Thinking more about Dan's concerns here ...
> 
> On 5/5/2022 6:00 PM, Dan Smith wrote:
>> This is significant because the primary reason to declare a B2 rather than a 
>> B3 is to guarantee that the all-zeros value cannot be created. 
> 
> This is a little bit of a circular argument; it takes a property that an 
> atomic B2 has, but a non-atomic B2 lacks, and declares that to be "the whole 
> point" of B2.  It may be that exposure of the zero is so bad we may 
> eventually want to back away from the idea, but let's come up with a fair 
> picture of what a non-atomic B2 means, and ask if that's sufficiently useful.

Fair. My interpretation is that we decided to create B2 because we weren't 
satisfied with the lack of guarantees offered to no-good-default classes that 
were reference-default B3s. So in that historical sense, B2s exist to offer 
guarantees.

>> This leads me to conclude that if you're declaring a non-atomic B2, you 
>> might as well just declare a non-atomic B3. 
> 
> Fair point, but let's pull on this string for a moment.  Suppose I want a 
> null-default, flattenable value, and I'm willing to take the tearing to get 
> there.  So you're saying "then declare a B3 and use B3.ref".  But B3.ref was 
> supposed to have the same semantics as an equivalent B2!  (I realize I'm 
> doing the same thing I just accused you of above -- taking an old invariant 
> and positiioning it as "the point".  Stay tuned.)  Which means either that we 
> lose flattening, again, or we create yet another asymmetry between B3.ref and 
> B2. Maybe you're saying that the combination of nullable and full-flat is 
> just too much to ask, but I am not sure it is; in any case, let's convince 
> ourselves of this before we rule it out.

Yeah, I think my mindset has been here—non-atomic flat nulls are just more 
trouble than they're worth—but I'm open to discovering a compelling use case.

Re: User model stacking: current status

2022-05-05 Thread Dan Smith

> On May 5, 2022, at 1:21 PM, Brian Goetz  wrote:
> 
> Let's write this out more explicitly.  Suppose that T1 writes a non-null 
> value (d, t, true), and T2 writes null as (0, 0, false).  Then it would be 
> possible to observe (0, 0, true), which means that we would be conceivably 
> exposing the zero value to the user, even though a B2 class might want to 
> hide its zero.  
> 
> So, suppose instead that we implemented writing a null as simply storing 
> false to the synthetic boolean field.  Then, in the event of a race between 
> reader and writer, we could only see values for date and time that were 
> previously put there by some thread.  This satisfies the OOTA (out of thin 
> air) safety requirements of the JMM.

(0, 0, false) is the initial value of a field/array, even if the VM implements 
a "narrow write" strategy. That is, if I write (1, 1, true) at the moment of 
reading from a fresh field, I could easily get (0, 0, true).

This is significant because the primary reason to declare a B2 rather than a B3 
is to guarantee that the all-zeros value cannot be created. (A secondary 
reason, valid but one I'm less sympathetic to, is that the all-zeros value is 
okay but inconvenient, and it would be nice to reduce how much it pops up. A 
third reason is reference-defaultness, important for migration if we don't 
offer it in B3.)

This leads me to conclude that if you're declaring a non-atomic B2, you might 
as well just declare a non-atomic B3.

Said differently: a B2 author usually wants to associate a cross-field 
invariant with the null flag (zero-value fields iff null). But in declaring the 
class non-atomic, they've sworn off cross-field invariants.

This was a useful discovery for me yesterday: that, in fact, nullability and 
atomicity are closely related. There's a strong theoretical defense for the 
idea that opting out of identity and supporting a non-null type (i.e., B3) are 
prerequisites to non-atomic flattening.

EG meeting, 2022-05-04

2022-05-04 Thread Dan Smith

EG Zoom meeting today at 4pm UTC (9am PDT, 12pm EDT).

We've had a flurry of activity in the last couple of weeks. I think we can 
summarize as follows:

- "Spec change documents for Value Objects": revised JVMS to align with 
previous discussions about Value Objects, and a new JLS changes document to 
match

- "We need help to migrate from bucket 1 to 2; and, the == problem": Kevin 
asked about JEP 390 applying to non-JDK classes, and about whether javac should 
discourage use of '=='

- "Foo / Foo.ref is a backward default": Kevin and Brian argued that we should 
prefer treating B3 classes as reference-default, with something like '.val' to 
opt in to a primitive value type

- "User model stacking": Brian discussed treating atomicity as an orthogonal 
property, no longer tied to B3

Re: User model stacking

2022-04-29 Thread Dan Smith

> On Apr 27, 2022, at 7:36 PM, Kevin Bourrillion  wrote:
> 
> This is kinda reading as...
> 
> * First we have 3 buckets
> * But people ask if there could be just 2 buckets
> * No, so let's have 5 buckets.
> 
> I don't understand why this is happening, but I take it back! I take back 
> what I said about 2 buckets!

Just so we don't lose this history, a reminder that back when we settled on the 
3 buckets, we viewed it as a useful simplification from a more general approach 
with lots of "knobs". Instead of asking developers to think about 3-4 
mostly-orthogonal properties and set them all appropriately, we preferred a 
model in which *objects* and *primitive values* were distinct entities with 
distinct properties. Atomicity, nullability, etc., weren't extra things to have 
to reason about independently, they were natural consequences of what it meant 
to be (or not) a variable that stores objects.

That was awhile ago, we may have learned some things since then, but I think 
there's still something to the idea that we can expect everybody to understand 
the difference between objects and primitives, even if they don't totally 
understand all the implications. (When they eventually discover some corner of 
the implications, we hope they'll say, "oh, sure, that makes sense because this 
is/isn't an object.")

> On Apr 28, 2022, at 8:13 AM, Brian Goetz  wrote:
> 
> My conclusion is that problem here is that we’re piggybacking atomicity on 
> other things, in non-obvious ways.  The author of the class knows when 
> atomicity is needed to protect invariants (specifically, cross-field 
> invariants), and when it is not, so let that simply be selected at the 
> declaration site.  Opting out of atomicity is safer and less surprising, so 
> that argues for tagging classes that don’t need atomicity as `non-atomic`.  
> (For some classes, such as single-field classes, it makes no difference, 
> because preserving atomicity has no cost, so the VM will just do it.)  
> 
> In addition to the explicitness benefits, now atomicity works uniformly 
> across B2 and B3, ref and val.  Not only does this eliminate the asymmetries, 
> but it means that classes that are B2 because they don’t have a good default, 
> can *routinely get better flattening* than they would have under the status 
> quo straw man; previously there was a big flattening gap, even with heroics 
> like stuffing four ints into 128 bit atomic loads.  When the user says “this 
> B2 is non-atomic”, we can immediately go full-flat, maybe with some extra 
> footprint for null. 

As a specific example, yes, there are some advantages to non-atomic B2s. But at 
the cost of disrupting the notion that B2 instances are always objects, and 
objects are, naturally, safely encapsulated. Would we say that objects are not 
necessarily atomic anymore? Or that these B2 instances aren't objects? My 
inclination would probably be to abandon the object/value dichotomy, revert to 
"everything is an object", perhaps revisit our ideas about 
conversions/subtyping between ref and val types, and develop a model that 
allows tearing of some objects. Probably all do-able, but I'm not sure it's a 
better model.

If the main goal here is to have an intuitive story that minimizes surprises, 
I'm currently pretty happy with (all terms here subject to further 
bikeshedding):
- Primitive classes (or just "primitives") have primitive value instances
- Like the primitives you know, these tend to be stored directly in memory
- Like the primitives you know, because of their storage sometimes there's a 
risk of tearing
- If you're declaring a multi-field primitive, you need to understand this risk 
and choose whether to allow tearing (via 'atomic' or 'non-atomic')

A critique here is that now we have more ad hoc "buckets", but the 'atomic' 
modifiers feel to me more like a minor piece of B3, not an entirely new bucket 
(and, bonus, a property that already exists within the space of primitives!). 
E.g., I can totally see javadoc having a tab for "Value Classes" and a separate 
tab for "Primitives", but I wouldn't expect tabs for "Atomic Primitives" and 
"Non-atomic Primitives". (Instead, maybe there's some boilerplate on the class 
page along the lines of "Note that this primitive is not atomic".)

Spec change documents for Value Objects

2022-04-27 Thread Dan Smith

Please see these two spec change documents for JLS and JVMS changes in support 
of the Value Objects feature.

http://cr.openjdk.java.net/~dlsmith/jep8277163/jep8277163-20220427/specs/value-objects-jls.html
http://cr.openjdk.java.net/~dlsmith/jep8277163/jep8277163-20220427/specs/value-objects-jvms.html

These are synced up with the latest iteration of the draft JEP, found here:
https://openjdk.java.net/jeps/8277163

I've applied the changes we discussed recently on this list:
- Replacing the 'IdentityObject' and 'ValueObject' interfaces with class 
modifiers
- Updating the treatment of class 'Object' so that it can continue to be 
instantiated
- Solidified the details of special '' instance creation methods

The JVMS document is layered on top of some JVMS cleanups that I need to circle 
back on (mostly as part of separate JEP https://openjdk.java.net/jeps/8267650).

I think we've reached a point where it's worth getting these production-ready. 
Please let me know if you notice missing pieces, think something needs better 
treatment, or just catch a typo. Thanks!

Re: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo

2022-04-27 Thread Dan Smith

On Apr 26, 2022, at 5:22 PM, Kevin Bourrillion 
mailto:kev...@google.com>> wrote:

(Thought experiment: if we had an annotation meaning "using the .val type is 
not a great idea for this class and you should get a compile-time warning if 
you do"  would we really and I mean *really* even need bucket 2 at all?)

Yes, because some (many?) class authors want strong guarantees that the initial 
(all-zeros) instance is never available in the wild. This is the most prominent 
encapsulation-breaking compromise that an author makes when moving from B2 to 
B3.

Re: Foo / Foo.ref is a backward default; should be Foo.val / Foo

2022-04-26 Thread Dan Smith

On Apr 26, 2022, at 12:37 PM, Dan Heidinga 
mailto:heidi...@redhat.com>> wrote:

The question again is what's the primary reason(s) for exposing a B3
(.val) vs B2 instance in APIs?  What guidance would we give API
designers around the use of B3 .val instances?

So one piece of guidance we could give is: "always use value types unless you 
have a good reason not to." If those semantics are acceptable, we're giving the 
JVM the best information we have to maximize possible performance gains (both 
today and in future JVMs). Exactly what JVMs do with the information can be a 
black box.

Alternatively, we can recommend "always use reference types unless you're sure 
there's a performance need for .val" (which you've nicely expanded into a more 
detailed set of rules). The nature of those rules depends on the answers to my 
list of performance questions:

- Are we confident that flattened L types on the stack have negligible costs 
compared to Q types? (E.g., is there no significant cost to using extra 
registers to track and check nulls?)

- Are we confident that we can achieve atomic, flattened L types on the heap 
for common cases?

- Are we confident that the performance cliff required to guarantee atomicity 
for heap-flattened L types is acceptable in general programming settings?

- Are we also confident that the extra null-tracking overhead of flattened L 
types on the heap is acceptable in most cases, and only needs to be compressed 
out by performance-tuning experts?

The goal of these questions is to ensure that "there's a performance need for 
.val" is a corner case.

In going down this path, we've opened the box and tied the guidance to 
properties of current/near-term implementations. So in addition to needing to 
validate these expectations, we'd want to be confident that the guidance won't 
look silly in 10 years as implementations change.

A risk in either case is that people disagree about how to interpret the 
guidance, and then you have mismatches between component boundaries, leading to 
unnecessary problems like expensive heap allocations, noisy null warnings, or 
incompatible data structures.

(Syntactically, I've been assuming that the "good name" would align with this 
guidance, going to the type we'd recommend using in most cases, but it's 
definitely possible to discourage general use of the good name, or not provide 
a "good name" at all.)

Re: We need help to migrate from bucket 1 to 2; and, the == problem

2022-04-26 Thread Dan Smith

On Apr 26, 2022, at 8:22 AM, Kevin Bourrillion 
mailto:kev...@google.com>> wrote:

It's a great start, but the key difference is that we need to be able to apply 
this process to *our own* types, not just the JDK types. Really, we should see 
whatever we need to do for JDK types as a clue to what other library owners 
will need as well.

Yes, a public annotation was the original proposal. At some point we scaled 
that back to just JDK-internal. The discussions were a long time ago, but if I 
remember right the main concern was that a formalized, Java SE notion of 
"value-based class" would lead to some unwanted complexity when we eventually 
get to *real* value classes (e.g., a misguided CS 101 course question: "what's 
the difference between a value-based class and a value class? which one should 
you use?"). It seemed like producing some special warnings for JDK classes 
would address the bulk of the problem without needing to fall into this trap.

Would an acceptable compromise be for a third-party tool to support its own 
annotations, while also recognizing @jdk.internal.ValueBased as an alternative 
spelling of the same thing?

(Secondarily... why are we warning only on synchronization, and not on `==` or 
(marginal) `identityHC`?)

I think this was simply not a battle that we wanted to fight—discouraging all 
uses of '==' on type Integer, for example.

We spent some time trying to figure out what to say about '==', and came up 
with this:

"the class does not provide any instance creation mechanism that promises a 
unique identity on each method call—in particular, any factory method's 
contract must allow for the possibility that if two independently-produced 
instances are equal according to equals(), they may also be equal according to 
==;"

and

"When two instances of a value-based class are equal (according to `equals`), a 
program should not attempt to distinguish between their identities, whether 
directly via reference equality or indirectly via an appeal to synchronization, 
identity hashing, serialization, or any other identity-sensitive mechanism."

(See 
https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/doc-files/ValueBased.html)

Within these constraints, there are reasonable things that can be done with 
'==', like optimizing for a situation where 'equals' is likely to be true. (I'm 
sympathetic to "don't do that anyway!", but it's more of a convention thing 
that javac would tend not to get involved with.)

Re: B3 ref model

2022-04-26 Thread Dan Smith

On Apr 26, 2022, at 9:19 AM, fo...@univ-mlv.fr wrote:

For me, L-type means: if you do not already know, you will discover later if 
it's a B1/B2/B3 when the class will be loaded.
The preload attribute means: if you do not already know, you should load the 
class now (at least when you want to take a decison based on the class being a 
B1/B2/B3 or not).

Yes, this is right.

Where you're going wrong (or at least in a different direction than the plan of 
record) is in the expectation that it should matter whether the class is a B2 
or a B3. If you look at JEP 401, you'll see that 'ACC_PRIMITIVE' just means 
"I'm a value class that also supports Q types."

A L-type does not mean, it's a pointer and it's always be a pointer, because if 
a user has chosen a class to be a B3, the VM should do whatever is possible to 
flatten it, even if the declared type is a L-type.

L types for both B2 and B3 classes may be flattened; in both cases, there's a 
requirement that atomicity be preserved. In the plan of record model, this is 
not by fiat, but a consequence of the fact that an L type is a reference type, 
and reference types come with traditional expectations about integrity.

Re: Foo / Foo.ref is a backward default; should be Foo.val / Foo

2022-04-26 Thread Dan Smith

On Apr 26, 2022, at 8:45 AM, Kevin Bourrillion 
mailto:kev...@google.com>> wrote:

I think I would insist that `.val` be spelled with only one additional 
character... or even that the value type be generated as the snake_case form of 
the name!

Okay, this is a meaningful refinement that I find less objectionable.

If it's '#Integer128' or 'Integer128!' instead of 'Integer128.val', we've 
trimmed away a chunk of the typing/reading overhead (though it's still there, 
and I think some of the overhead is directly in service of what you find 
attractive—the idea that the value type is something unnatural/a departure from 
the norm).

If it's 'integer128' and 'Integer128', well now there is no "default" type, and 
I think we're talking about something categorically different. There are some 
new (surmountable?) problems, but my earlier objections don't apply.

Re: Foo / Foo.ref is a backward default; should be Foo.val / Foo

2022-04-26 Thread Dan Smith

On Apr 25, 2022, at 8:20 PM, Kevin Bourrillion 
mailto:kev...@google.com>> wrote:

On Mon, Apr 25, 2022 at 7:52 PM Dan Smith 
mailto:daniel.sm...@oracle.com>> wrote:

Yeah, I think this has to be the starting place, before we get into whatever 
other model simplifications, compatible migrations, etc., might be gained.

The expectation for any two-types-for-one-name approach should be, I think, 
that almost all types referencing the class should use the simple name. The 
non-default form is for special cases only.

Whose expectation is that -- do you mean it will be what users expect? Because 
they might, but that's not the same as good design.

It's how I interpret our requirements, I guess?

The vision of B3 is "user-defined primitives": that someone can define in a 
library a type that can be used interchangeably with the existing built-in 
primitive types. (We can debate whether "primitive" is the right word here, but 
the concept persists under whatever naming scheme.)

If the expectation is that a typical programmer is going to look over their 
menu of types and choose between 'int', 'long', or 'Integer128.val', I think 
we've heavily biased them against the third one. The syntactic overhead is just 
too much.

Whereas if we're saying "just use plain reference type 'Integer128', it'll 
usually be fine", that's probably something we can sell (if we can deliver on 
"usually fine"), even though the menu will be more like 'Integer', 'Long', and 
'Integer128'.

So if we're considering an approach in which the reference type is used almost 
all the time, we need to establish that doing so will not be considered a "bad 
practice" for performance reasons. Specifically:

I don't see why this is. If there's bad performance, the users have the freedom 
to help themselves to the better performance any time they want to, for the 
minor cost of a little "sprinkling". That sounds like Valhalla success to me. 
Isn't it?

I think our success will come from widespread high-performance use of these 
classes. Like how 'int' works. If the L types are not "high-performance" (a 
subjective measure, I know), and the Q types are pain to use, I worry that 
won't be perceived as successful. (Either "Valhalla is a pain to use" or 
"Valhalla rarely delivers the promised performance".)

A good test for me is this: if we asked everybody to stop saying 'int' all the 
time, and prefer 'Integer' instead except in performance-critical code, could 
we effectively convince them to set aside their misgivings about performance 
and trust the JVM to be reasonably efficient?

Well, the thing forcing our hand in our case is the need to work within the 
limitations of a language with 28 years of expectations already rooted in 
brains.

I'm thinking about this test more from a clean slate perspective, I think: 
rephrased, in a new language (something like Kotlin, say), could we leave out 
'int', and convince people to do everything with 'Integer', or in 
performance-sensitive cases say 'Integer.val'? Would that language be perceived 
as worse (on either performance or syntactic grounds) than Java?

Re: B3 ref model

2022-04-25 Thread Dan Smith

> On Apr 25, 2022, at 3:08 PM, Remi Forax  wrote:
> 
> Ok, maybe i've not understood correctly how B3 model works,
> for me being a B3 is a runtime property, not a type property.
> 
> By example, if there is an Object but the VM knows the only possible type is 
> a B3 and the value is not null then the VM is free to emit several stores, 
> because it's a B3, so tearing can occur.
> 
> Said differently, B3 allows tearing, so B3.val and B3.ref allow tearing.
> 
> If i do not want tearing, then B3 has to be stored in a field volatile or i 
> have to declare the class as a B2.
> 
> Did i get it right ?

The model we've designed is that B3 instances can be represented as *objects* 
or *primitive values*. Objects enforce atomicity as part of their encapsulation 
behavior; primitive values do not. Whether something is an object or not is a 
property of types—ref and val at the language level, L and Q at the JVM level.

Re: Foo / Foo.ref is a backward default; should be Foo.val / Foo

2022-04-25 Thread Dan Smith

On Apr 25, 2022, at 8:05 AM, Brian Goetz 
mailto:brian.go...@oracle.com>> wrote:

Let’s state the opposing argument up front, because it was our starting point: 
having to say “Complex.val” for 99% of the utterances of Complex would likely 
be perceived as “boy those Java guys love their boilerplate” (call this the 
“lol java” argument for short.)  But, since then, our understanding of how this 
will all actually work has evolved, so it is appropriate to question whether 
this argument still holds the weight we thought it did at the outset.

Yeah, I think this has to be the starting place, before we get into whatever 
other model simplifications, compatible migrations, etc., might be gained.

The expectation for any two-types-for-one-name approach should be, I think, 
that almost all types referencing the class should use the simple name. The 
non-default form is for special cases only.

So if we're considering an approach in which the reference type is used almost 
all the time, we need to establish that doing so will not be considered a "bad 
practice" for performance reasons. Specifically:

- Are we confident that flattened L types on the stack have negligible costs 
compared to Q types? (E.g., is there no significant cost to using extra 
registers to track and check nulls?)

- Are we confident that we can achieve atomic, flattened L types on the heap 
for common cases?

- Are we confident that the performance cliff required to guarantee atomicity 
for heap-flattened L types is acceptable in general programming settings?

- Are we also confident that the extra null-tracking overhead of flattened L 
types on the heap is acceptable in most cases, and only needs to be compressed 
out by performance-tuning experts?

If the answer to all of those is "yes", *then* I think there's an argument that 
the model simplifications, etc., could be worth asking performance-crucial code 
to sprinkle in some '.val' types. But I'm sure we're not ready to say "yes" to 
all those yet...

A good test for me is this: if we asked everybody to stop saying 'int' all the 
time, and prefer 'Integer' instead except in performance-critical code, could 
we effectively convince them to set aside their misgivings about performance 
and trust the JVM to be reasonably efficient?

EG meeting canceled, 2022-04-20

2022-04-19 Thread Dan Smith

No new email threads, we'll cancel this time.

EG meeting, 2022-04-06

2022-04-06 Thread Dan Smith

Sorry, missed putting this mail together earlier. Planning to meet today, I'll 
be a little late. Let's plan on starting about 15 minutes after.

Thanks,
Dan

Re: Alternative to IdentityObject & ValueObject interfaces

2022-04-01 Thread Dan Smith

On Mar 22, 2022, at 10:52 PM, Dan Smith 
mailto:daniel.sm...@oracle.com>> wrote:

On Mar 22, 2022, at 7:21 PM, Dan Heidinga 
mailto:heidi...@redhat.com>> wrote:

A couple of comments on the encoding and questions related to descriptors.


JVM proposal:

- Same conceptual framework.

- Classes can be ACC_VALUE, ACC_IDENTITY, or neither.

- Legacy-version classes are implicitly ACC_IDENTITY. Legacy interfaces are 
not. Optionally, modern-version concrete classes are also implicitly 
ACC_IDENTITY.

Maybe this is too clever, but if we added ACC_VALUE and ACC_NEITHER
bits, then any class without one of the bits set (including all the
legacy classes) are identity classes.


(Trying out this alternative approach to abstract classes: there's no more 
ACC_PERMITS_VALUE; instead, legacy-version abstract classes are automatically 
ACC_IDENTITY, and modern-version abstract classes permit value subclasses 
unless they opt out with ACC_IDENTITY. It's the bytecode generator's 
responsibility to set these flags appropriately. Conceptually cleaner, maybe 
too risky...)

With the "clever" encoding, every class is implicitly identity unless
it sets ACC_VALUE or ACC_NEITHER and bytecode generators have to
explicitly flag modern abstract classes.  This is kind of growing on
me.

A problem is that interfaces are ACC_NEITHER by default, not ACC_IDENTITY. 
Abstract classes and interfaces have to get two different behaviors based on 
the same 0 bits.

Here's another more stable encoding, though, that feels less fiddly to me than 
what I originally wrote:

ACC_VALUE means "allows value object instances"

ACC_IDENTITY means "allows identity object instances"

If you set *both*, you're a "neither" class/interface. (That is, you allow both 
kinds of instances.)

If you set *none*, you get the default/legacy behavior implicitly: classes are 
ACC_IDENTITY only, interfaces are ACC_IDENTITY & ACC_VALUE.

Update on encoding: after some internal discussion, I've found this to be the 
most natural fit:

- ACC_VALUE (0x0040) corresponds to the 'value' keyword in source files
- ACC_IDENTITY (0x0020) corresponds to the (often implicit) 'identity' keyword 
in source files
- If neither is set, the class/interface supports both kinds of subclasses (and 
must be abstract)
- If both are set, or any supers' flags conflict, it's an error
- In older-version classes (not interfaces), ACC_IDENTITY is assumed to be set

What about newer-version classes that use old encodings? (E.g., a tool bumps 
its output version number but isn't aware of these flags.) There's a sneaky 
trick here that minimizes the risk: ACC_IDENTITY is re-using the old ACC_SUPER, 
which no longer has any effect and that we've encouraged to be set since Java 
1.0.2. So if you're already setting ACC_SUPER in your classes, you've 
automatically opted in to ACC_IDENTITY; doing something different requires 
making changes to the generated code.

So the remaining incompatibility risk is that someone generates a class (not an 
interface) with a newer version number and with neither flag set (violating the 
"always set ACC_SUPER" advice), and then either the class won't load (it's 
concrete, it declares an instance field, etc.), or it's abstract and 
accidentally supports value subclasses, and so can be instantiated without 
running  logic. The number of unlikely events in this scenario seem like 
enough for us not to be concerned.

Object as a concrete class

2022-03-31 Thread Dan Smith

One of our requirements has been that 'new Object()' must be re-interpreted 
(both at compile time and run time) to instantiate some other class—Object is 
effectively abstract. The motivation here is that every class instance must be 
identified as an identity object or a value object, and the mechanism for that 
is the corresponding class declaration. But if 'Object' were an identity class, 
then no value class could extend it.

That is, this code needs to work:

assert new Object() instanceof IdentityObject;
assert new Point(1,2) instanceof ValueObject;

*However*, as Remi was eager to pursue awhile ago, in a world in which class 
modifiers, not superinterfaces, convey the identity/value distinction, we're no 
longer so closely tied to class declarations, and it becomes easier to make 
Object a special case. This code still needs to work:

assert new Object().hasIdentity();
assert !new Point().hasIdentity();

But the 'hasIdentity' method can contain arbitrary logic, and doesn't 
necessarily need to correlate with 'getClass().isIdentityClass()'.

So we could have a world in which some objects are instances of a concrete 
class that is neither an identity class nor a value class, but where those 
objects are still identity objects.

I don't see a useful way to generalize this to other "both kinds" classes (for 
example, any class with an instance field must be an identity class or a value 
class). But since we have to make special allowances for Object one way or 
another, it does seem plausible that we let 'new Object()' continue to create 
direct instances of class Object, and then specify the following special rules:

- All concrete, non-value classes are implicitly identity classes *except for 
Object*

- The 'new' bytecode is allowed to be used with concrete identity classes *and 
class Object*

- Identity objects include instances of identity classes, arrays, *and 
instances of Object*; 'hasIdentity' reflects this

- [anything else?]

There's some extra complexity here, but balanced against the cost of making 
every Java programmer adjust their model of what 'new Object()' means, and 
corresponding coding style refactorings, it seems like a win.

Thoughts?

Re: Alternative to IdentityObject & ValueObject interfaces

2022-03-23 Thread Dan Smith

On Mar 22, 2022, at 5:56 PM, Dan Smith
mailto:daniel.sm...@oracle.com>> wrote:

- Variable types: I don't see a good way to get the equivalent of an
'IdentityObject' type. It would involve tracking the 'identity' property
through the whole type system, which seems like a huge burden for the
occasional "I'm not sure you can lock on that" error message. So we'd probably
need to be okay letting that go. Fortunately, I'm not sure it's a great
loss—lots of code today seems happy using 'Object' when it means, informally,
"object that I've created for the sole purpose of locking".

- Type variable bounds: this one seems more achievable, by using the 'value'
and 'identity' keywords to indicate a new kind of bounds check (''). Again, it's added complexity, but it's more localized. We
should think more about the use cases, and decide if it passes the cost/benefit
analysis. If not, nothing else depends on this, so it could be dropped. (Or
left to a future, more general feature?)

Per today's discussion, this part seems to be the central question: how much
value can we expect to get out of compile-time checking?

Stepping back from the type system details (that is, the below discussion
applies whether we're using interfaces, modifiers on types, or something else),
it's worth asking what errors we hope these features will help detect. We
identified a couple of them today (and earlier in this thread):

- 'synchronized' on a value object
- storing a value object in a weak reference (in a world in which weak
references don't support value objects)

Two questions:

1) What are the requirements for the analysis? How effective can it be?

The type system is going to have three kinds of types:
- types that guarantee identity objects
- types that guarantee value objects
- types that include both kinds of objects

That third kind are a problem: we can specify checks with false positives
(programmer knows the operation is safe, compiler complains anyway) or false
negatives (operation isn't safe, but the compiler lets it go).

For example, for the 'synchronized' operation, it's straightforward for the
compiler to complain on a value class type. But what do we do with
'synchronized' on some interface type? Say we go the false positive route; the
check probably looks like a warning ("you might be synchronizing on a value
object"). In this case:

- We've just created a bunch of warnings in existing code that people will
probably just @SuppressWarnings rather than try to address through the types,
because changing the types (throughout the flow of data) is a lot of work and
comes with compatibility risks.

- Even in totally new code, if I'm not working with a specific identity class,
I'm not sure I would bother fiddling with the types to get better checking. It
seems really tedious. (For example, changing an interface-typed parameter 'Foo'
to intersection type 'Foo & IdentityObject'.)

If we prefer to allow false negatives, then it's straightforward: value class
types get errors, other types do not. There's no need for extra type system
features. (E.g., 'IdentityObject' and 'Object' get treated exactly the same by
'synchronized'.)

For weak references, it definitely doesn't make sense to reject types like
WeakReference—that would be a compatibility mess. We could warn, but
again, lots of false positive risk; and warnings don't generalize to
general-purpose use of generics. I think again the best we could hope to do is
to reject value class types. But something like 'T extends IdentityObject'
doesn't accomplish this, because it excludes the "both kinds" types. Instead,
we'd need something like 'T !extends ValueObject'.

2) Are these the best use cases we have? and are they really all that important?

These are the ones we've focused on, but maybe we can think of other
applications. Other use cases would similarly have to involve the differences
in runtime semantics.

Our two use cases share the property that they detect a runtime error (either
an expression that we know will always throw, or with more aggressive checking
an expression that *could* throw). That's helpful, but I do wonder how common
such errors will be. We could do a bunch of type system work to detect division
by zero, but nobody's asking for that because programmers just tend to avoid
making that mistake already.

Synchronization: best practice is already to "own" the object being locked on,
and that kind of knowledge isn't tracked by the type system. Doesn't seem that
different for programmers to also be aware of whether their locking objects are
identity object

EG meeting, 2022-03-23

2022-03-23 Thread Dan Smith

EG Zoom meeting today at *4pm* UTC (9am PDT, 12pm EDT).

Thanks for the feedback in the "Alternative to IdentityObject & ValueObject 
interfaces" thread. We can continue that discussion.

Re: Alternative to IdentityObject & ValueObject interfaces

2022-03-22 Thread Dan Smith

On Mar 22, 2022, at 7:44 PM, Kevin Bourrillion 
mailto:kev...@google.com>> wrote:

On Tue, Mar 22, 2022 at 4:56 PM Dan Smith 
mailto:daniel.sm...@oracle.com>> wrote:

In response to some encouragement from Remi, John, and others, I've decided to 
take a closer look at how we might approach the categorization of value and 
identity classes without relying on the IdentityObject and ValueObject 
interfaces.

(For background, see the thread "The interfaces IdentityObject and ValueObject 
must die" in January.)

Could anyone summarize the strongest version of the argument against them? The 
thread is not too easy to follow.

I'm sure there's more, but here's my sense of the notable problems with the 
status quo approach:

- We're adding a marker interface to every concrete class in the Java universe. 
Generally, an extra marker interface shouldn't affect anything, but the Java 
universe is big, and we're bound to break some things (specifically by changing 
reflection behavior and by producing more compile-time intersection types). We 
can ask people to fix their code and make fewer assumptions, but it adds 
upgrade friction, and the budget for breaking stuff is not unlimited.

- Injecting superinterfaces is something entirely new that I think JVMs would 
really rather not be involved with. But it's necessary for compatibly evolving 
class files. We've spent a surprising amount of time working out exactly when 
the interfaces should be injected; separate compilation leads to tricky corner 
cases.

- There's a tension between our use of modifiers and our use of interfaces. 
We've made some ad hoc choices about which are used in which places (e.g., you 
can't declare a concrete value class by saying 'class Foo implements 
ValueObject'). In the JVM, we need modifiers for format checking and interfaces 
for types. This is all okay, but the arbitrariness and redundancy of it is 
unsatisfying and suggests there might be some accidental complexity.

- Subclass restriction: 'implements IdentityObject' has been replaced with the 
'identity' modifier. Complexity cost of special modifiers seems on par with the 
complexity of special rules for inferring and checking the superinterfaces.

The rules for the modifiers are okay. But here's my observation. The simplest 
way to explain those rules would be if the `value` keyword is literally 
shorthand for `extends/implements ValueObject`. I think the rules fall out from 
that, plus:

  *   IO and VO are disjoint. (As interfaces can already be, like `interface 
Foo { int x(); }` and `interface Bar { boolean x(); }`, and if it really came 
down to it, you could literally put an incompatible method into each type and 
blame their noncohabitation on that :-))
  *   A class that breaks the value class rules has committed to being an 
identity class.
  *   We wouldn't know how to make an instance that is "neither", so 
instantiating a "neither" class has to have default behavior, and that has to 
be to give you what it always has.

In each case I've explained why the rule seems very easy to understand to me. 
So from my POV, this still pulls me back to the types anyway. I would say that 
your rules for the modifiers are largely simulating those types.

Yes, it is nice how we get inheritance for free from interfaces. But when you 
compare that with the "plus" list (which I'd summarize as: disjointedness, 
declaration restrictions, and inference), it's not like getting inheritance 
"for free" is such a huge win. It's maybe 20% less complexity or something to 
explain the feature.

Of course the big win is that interfaces are types, so we already know how to 
use them in the static type system. As your later comments suggest, I think our 
expectations for static typing are probably the most important factor in 
deciding which strategy best meets our needs.

Re: Alternative to IdentityObject & ValueObject interfaces

2022-03-22 Thread Dan Smith

On Mar 22, 2022, at 7:21 PM, Dan Heidinga 
mailto:heidi...@redhat.com>> wrote:

A couple of comments on the encoding and questions related to descriptors.


JVM proposal:

- Same conceptual framework.

- Classes can be ACC_VALUE, ACC_IDENTITY, or neither.

- Legacy-version classes are implicitly ACC_IDENTITY. Legacy interfaces are 
not. Optionally, modern-version concrete classes are also implicitly 
ACC_IDENTITY.

Maybe this is too clever, but if we added ACC_VALUE and ACC_NEITHER
bits, then any class without one of the bits set (including all the
legacy classes) are identity classes.


(Trying out this alternative approach to abstract classes: there's no more 
ACC_PERMITS_VALUE; instead, legacy-version abstract classes are automatically 
ACC_IDENTITY, and modern-version abstract classes permit value subclasses 
unless they opt out with ACC_IDENTITY. It's the bytecode generator's 
responsibility to set these flags appropriately. Conceptually cleaner, maybe 
too risky...)

With the "clever" encoding, every class is implicitly identity unless
it sets ACC_VALUE or ACC_NEITHER and bytecode generators have to
explicitly flag modern abstract classes.  This is kind of growing on
me.

A problem is that interfaces are ACC_NEITHER by default, not ACC_IDENTITY. 
Abstract classes and interfaces have to get two different behaviors based on 
the same 0 bits.

Here's another more stable encoding, though, that feels less fiddly to me than 
what I originally wrote:

ACC_VALUE means "allows value object instances"

ACC_IDENTITY means "allows identity object instances"

If you set *both*, you're a "neither" class/interface. (That is, you allow both 
kinds of instances.)

If you set *none*, you get the default/legacy behavior implicitly: classes are 
ACC_IDENTITY only, interfaces are ACC_IDENTITY & ACC_VALUE.

Alternative to IdentityObject & ValueObject interfaces

2022-03-22 Thread Dan Smith

In response to some encouragement from Remi, John, and others, I've decided to 
take a closer look at how we might approach the categorization of value and 
identity classes without relying on the IdentityObject and ValueObject 
interfaces.

(For background, see the thread "The interfaces IdentityObject and ValueObject 
must die" in January.)

These interfaces have found a number of different uses (enumerated below), 
while mostly leaning on the existing functionality of interfaces, so there's a 
pretty good complexity vs. benefit trade-off. But their use has some rough 
edges, and inserting them everywhere has a nontrivial compatibility impact. Can 
we do better?

Language proposal:

- A "value class" is any class whose instances are all value objects. An 
"identity class" is any class whose instances are all identity objects. 
Abstract classes can be value classes or identity classes, or neither. 
Interfaces can be "value interfaces" or "identity interfaces", or neither.

- A class/interface can be designated a value class with the 'value' modifier.

value class Foo {}
abstract value class Bar {}
value interface Baz {}
value record Rec(int x) {}

A class/interface can be designated an identity class with the 'identity' 
modifier.

identity class Foo {}
abstract identity class Bar {}
identity interface Baz {}
identity record Rec(int x) {}

- Concrete classes with neither modifier are implicitly 'identity'; abstract 
classes with neither modifier, but with certain identity-dependent features 
(instance fields, initializers, synchronized methods, ...) are implicitly 
'identity' (possibly with a warning). Other abstract classes and interfaces are 
fine being neither (thus supporting both kinds of subclasses).

- The properties are inherited: if you extend a value class/interface, you are 
a value/class interface. (Same for identity classes/interfaces.) It's an error 
to be both.

- The usual restrictions apply to value classes, both concrete and abstract; 
and also to "neither" abstract classes, if they haven't been implicitly made 
'identity'.

- An API ('Object.isValueObject()'?) allows for dynamically distinguishing 
between value objects and identity objects. The reflection API (in 
java.lang.Class) allows for detection of value classes/interfaces, identity 
classes/interfaces, and "neither" classes/interfaces.

- TBD whether/how we track these properties statically so that the type system 
catch mismatches between non-identity class types and uses that assume identity.

JVM proposal:

- Same conceptual framework.

- Classes can be ACC_VALUE, ACC_IDENTITY, or neither.

- Legacy-version classes are implicitly ACC_IDENTITY. Legacy interfaces are 
not. Optionally, modern-version concrete classes are also implicitly 
ACC_IDENTITY.

(Trying out this alternative approach to abstract classes: there's no more 
ACC_PERMITS_VALUE; instead, legacy-version abstract classes are automatically 
ACC_IDENTITY, and modern-version abstract classes permit value subclasses 
unless they opt out with ACC_IDENTITY. It's the bytecode generator's 
responsibility to set these flags appropriately. Conceptually cleaner, maybe 
too risky...)

- At class load time, we inherit value/identity-ness and check for conflicts. 
It's okay to have neither flag set but inherit the property from one of your 
supers. We also enforce constraints on value classes and "neither" abstract 
classes.

---

So how does this score as a replacement for the list of features enabled by the 
interfaces?

- Dynamic detection: 'obj instanceof ValueObject' is quite straightforward; if 
we can replace that with 'obj.isValueObject()', that feels about equally 
useful. (I'd be more pessimistic about something like 
'Objects.isValueObject(obj)'.)

- Subclass restriction: 'implements IdentityObject' has been replaced with the 
'identity' modifier. Complexity cost of special modifiers seems on par with the 
complexity of special rules for inferring and checking the superinterfaces. I 
think it's a win that we use the 'value' modifier and "value" terminology for 
all kinds of classes/interfaces, not just concrete classes.

- Variable types: I don't see a good way to get the equivalent of an 
'IdentityObject' type. It would involve tracking the 'identity' property 
through the whole type system, which seems like a huge burden for the 
occasional "I'm not sure you can lock on that" error message. So we'd probably 
need to be okay letting that go. Fortunately, I'm not sure it's a great 
loss—lots of code today seems happy using 'Object' when it means, informally, 
"object that I've created for the sole purpose of locking".

- Type variable bounds: this one seems more achievable, by using the 'value' 
and 'identity' keywords to indicate a new kind of bounds check (''). Again, it's added complexity, but it's more localized. We 
should think more about the use cases, and decide if it passes the cost/benefit 
analysis. If not, nothing else depends on this, so it could be drop

EG meeting canceled, 2022-03-09

2022-03-08 Thread Dan Smith

Only list traffic since last meeting is a couple of followups to that 
discussion, so I think we can skip this time. Next meeting March 23.

Re: Evolving instance creation

2022-03-01 Thread Dan Smith

On Mar 1, 2022, at 6:56 AM, Kevin Bourrillion 
mailto:kev...@google.com>> wrote:

The main thing I think CICEs/`new` accomplish is simply to "cross the bridge". 
Constructors are void and non-static; yet somehow we need to call them as if 
they're static and non-void! `new` gets us across that gap. This seems to me 
like a special-snowflake problem that `new` is custom built to address, and I 
would hope we keep it.

Okay. So support for 'new Point()' (1) over just 'Point()' (3) on the basis 
that constructor declarations need special magic to enter the context where the 
constructor body lives. So as long as we're declaring value class constructors 
in the same way as identity class constructors, it makes sense for both to have 
the same invocation syntax, and for that syntax to be somehow different from a 
method invocation.

I suppose (3) envisions this magic happening invisibly, as part of the 
instantiation API provided by the class—there's some magic under the covers 
where a bridge/factory-like entity gets invoked and sets up the context for the 
constructor body. But I agree that it's probably better not to have to appeal 
to something invisible when people are already used to the magic being explicit.

A couple more minor points about the factories idea:

A related, possibly-overlapping new Java feature idea (not concretely proposed, 
but something the language might want in the future) is the declaration of 
canonical factory methods in a class, which intentionally *don't* promise 
unique instances (for example, they might implement interning). These factories 
would be like constructors in that they wouldn't have a unique method name, but 
otherwise would behave like ad hoc static factory methods—take some arguments, 
use them to create/locate an appropriate instance, return it.

Can you clarify what these offer that static methods don't already provide? The 
two weaknesses I'm aware of with static factory methods are (1) subclasses 
still need a constructor to call and (2) often you don't really want the burden 
of naming them, you just want them to look like the obvious standard creation 
path. It sounds like this addresses (2) but not (1), and I assume also 
addresses some (3).

A couple of things:

- If it's canonical, everybody knows where to find it. APIs like reflection and 
tools like serialization can create instances through a universally-recognized 
mechanism (but one that is more flexible than constructors).

- In a similar vein, if JVMS can count on instantiation being supported by a 
canonical method name, then this approach can subsume existing uses of 
'new/dup/', which are a major source of complexity. This is a very long 
game, but the idea is that eventually the old mechanism (specifically, use of 
the 'new' bytecode outside of the class being instantiated) could be deprecated.

(2) 'new Foo()' as a general-purpose creation tool

In this approach, 'new Foo()' is the use-site syntax for *both* factory and 
constructor invocation. Factories and constructors live in the same overload 
resolution "namespace", and all will be considered by the use site.

It sounds to me like these factories would be static, so `new` would not be 
required by the "cross the bridge" interpretation given above.

Right. This approach gives up the use-site/declaration-site alignment, instead 
interpreting 'new' as "make me one of these, using whatever mechanism the class 
provides".

Re: Evolving instance creation

2022-02-24 Thread Dan Smith

> On Feb 24, 2022, at 8:47 AM, Remi Forax  wrote:
> 
> - Original Message -
>> From: "Dan Heidinga" 
>> To: "daniel smith" 
>> Cc: "valhalla-spec-experts" 
>> Sent: Thursday, February 24, 2022 4:39:52 PM
>> Subject: Re: Evolving instance creation
> 
>> Repeating what I said in the EG meeting:
>> 
>> * "new" carries the mental model of allocating space.  For identity
>> objects, that's on the heap.  For values, that may just be stack space
>> / registers.  But it indicates that some kind of allocation / demand
>> for new storage has occurred.
>> 
>> * It's important that "new" returns a unique instance.  That invariant
>> has existed since Java's inception and we should be careful about
>> breaking it.  In the case of values, two identical values can't be
>> differentiated so I think we're safe to say they are unique but
>> indistinguishable as no user program can differentiate them.
> 
> Yes, it's more about == being different than "new" being different.
> 
> "new" always creates a new instance but in case of value types, == does not 
> allow us see if the instance are different or not.

I'm not sure this is a good way to think value creation, though. It suggests 
that there still *is* an identity there (i.e., the new value has been newly 
allocated), you just can't see it.

I'd rather have programmers think in these terms: when you instantiate a value 
class, you might get an object that already exists. Whether there are copies of 
that object at different memory locations or not is irrelevant—it's still *the 
same object*.

Re: Abstract class with fields implementing ValueObject

2022-02-24 Thread Dan Smith

TLDR: I'm convinced, let's revise our approach so that the JVM never infers 
interfaces for abstract classes.

On Feb 24, 2022, at 8:57 AM, Dan Heidinga 
mailto:heidi...@redhat.com>> wrote:

Whether
they can be instantiated is a decision better left to other parts of
the spec (in this case, I believe verification will succeed and
resolution of the `super()`  call will fail).

Right, my mistake. Verifier doesn't care what methods are declared in the 
superclass, but resolution of the invokespecial will fail.

(3) no ACC_PERMITS_VALUE,  declaration

The JVM infers that this class implements IdentityObject, if it doesn't 
already. If it also implements ValueObject, an error occurs at class load time.

I think this should be driven purely by the presence of the
ACC_PERMITS_VALUE flag and the VM shouldn't be looking at the 
methods.

Sounds like the consensus, agreed.

 The JVM shouldn't infer either IdentityObject or ValueObject
for this abstract class - any inference decision should be delayed to
the subclasses that extend this abstract class.

My initial reaction was that, no, we really do want IdentityObject here, 
because it's useful to be able to assign an abstract class type to 
IdentityObject.

But: for new classes, the compiler will have an opportunity to be explicit. 
It's mostly a question of how we handle legacy classes. And there, it would 
actually be bad to infer IdentityObject, when in most cases the class will get 
permits_value when it is recompiled. Probably best to avoid a scenario like:

- Compile against legacy API, assign library.AbstractBanana to IdentityObject 
in your code
- Upgrade to newer version of the API, assignment from library.AbstractBanana 
to IdentityObject is an error

So, okay, let's say we limit JVM inference to concrete classes. And javac will 
infer/generate 'implements IdentityObject' if it decides an abstract class 
can't be permits_value.

What about separate compilation? javac's behavior might be something like: 1) 
look for fields, 'synchronized', etc. in the class declaration, and if any are 
present, add 'implements IdentityObject' (if it's not already there); 2) if the 
superclass is permits_value and this class doesn't extend IdentityObject 
(directly or indirectly), set permits_value. (1) is a local decision, while (2) 
depends on multiple classes, so can be disrupted by separate compilation. But, 
thinking through the scenarios here... I'm pretty comfortable saying that an 
abstract class that is neither permits_value nor a subclass of IdentityObject 
is in an unstable state, and, like the legacy case, it's probably better if 
programmers *don't* write code assuming they can assign to IdentityObject.

Re: Abstract class with fields implementing ValueObject

2022-02-23 Thread Dan Smith

Fred suggested that we enumerate the whole space here. So, some cases to 
consider:

{ ACC_PERMITS_VALUE, not }
{ has an  declaration, not }
{ implements IdentityObject, not }
{ implements ValueObject, not }

"implements" here refers to both direct and indirect superinterfaces.

I'll focus on the first two, which affect the inference of superinterfaces.

(1) ACC_PERMITS_VALUE,  declaration

This is a class that is able to support both identity and value subclasses. It 
implements no extra interfaces, but can restrict its subclasses via 'implements 
IdentityObject' or 'implements ValueObject'.

(2) ACC_PERMITS_VALUE, no  declaration

The JVM infers that this class implements ValueObject, if it doesn't already. 
If it also implements IdentityObject, an error occurs at class load time.

(Design alternative: we could ignore the  declarations and treat this 
like case (1). In that case, the class could implement IdentityObject or be 
extended by identity classes without error (as long as it doesn't also 
implement ValueObject). But those identity subclasses couldn't declare 
verification-compatible  methods, just like subclasses of abstract 
classes that have no  methods today.)

(3) no ACC_PERMITS_VALUE,  declaration

The JVM infers that this class implements IdentityObject, if it doesn't 
already. If it also implements ValueObject, an error occurs at class load time.

(4) no ACC_PERMITS_VALUE, no  declaration

This is a class that effectively supports *no* subclasses. We don't infer any 
superinterfaces, but it can choose to implement IdentityObject or ValueObject. 
A value class that extends this class will fail to load. If the class doesn't 
implement ValueObject, an identity class that extends this class could load, 
but couldn't declare verification-compatible  methods, just like 
subclasses of abstract classes that have no  methods today.

(Design alternative: we could ignore the  declarations and treat this 
like case (3). In that case, it would be an error for the class to implement 
ValueObject, because it also implicitly implements IdentityObject.)

---

Spelling this out makes me feel like treating the presence of  methods as 
an inference signal may be overreaching and overcomplicating things. Today, 
declaring an  method, or not, has no direct impact on anything, other 
than the side-effect that you can't write verification-compatible  
methods in your subclasses. I like the parallel between "permits identity (via 
)" and "permits value (via flag)", but flags and  methods aren't 
really parallel constructs; in cases (2) and (4), we still "permit" identity 
subclasses, even if they're pretty useless.

(And it doesn't help that javac doesn't give you any way to create these 
-free classes, so in practice they certainly don't have parallel 
prevalence.)

Pursuing the "design alternative" strategies would essentially collapse this 
down to two cases: (1) ACC_PERMITS_VALUE, no superinterfaces inferred, but 
various checks performed (e.g., no instance fields); and (3) no 
ACC_PERMITS_VALUE, IdentityObject is inferred, error if there's also an 
explicit 'implements ValueObject'.

How do we feel about that?

EG meeting, 2022-02-23

2022-02-23 Thread Dan Smith

EG Zoom meeting today at 5pm UTC (9am PDT, 12pm EDT).

Potential topics:

"Evolving instance creation": I discussed considerations for how we evolve 
class instance creation expressions in the language

"Abstract class with fields implementing ValueObject": discussion about a 
bytecode validation scenario

Left over from last time:

- IdentityObject and ValueObject interfaces

- Terminology

Evolving instance creation

2022-02-22 Thread Dan Smith

One of the longstanding properties of class instance creation expressions ('new 
Foo()') is that the instance being produced is unique—that is, not '==' to any 
previously-created instance.

Value classes will disrupt this invariant, because it's possible to "create" an 
instance of a value class that already exists:

new Point(1, 2) == new Point(1, 2) // always true

A related, possibly-overlapping new Java feature idea (not concretely proposed, 
but something the language might want in the future) is the declaration of 
canonical factory methods in a class, which intentionally *don't* promise 
unique instances (for example, they might implement interning). These factories 
would be like constructors in that they wouldn't have a unique method name, but 
otherwise would behave like ad hoc static factory methods—take some arguments, 
use them to create/locate an appropriate instance, return it.

I want to focus here on the usage of class instance creation expressions, and 
how to approach changes to their semantics. This involves balancing the needs 
of programmers who depend on the unique instance invariant with those who don't 
care and would prefer fewer knobs/less complexity.

Here are three approaches that I could imagine pursuing:

(1) Value classes are a special case for 'new Foo()'

This is the plan of record: the unique instance invariant continues to hold for 
'new Foo()' where Foo is an identity class, but if Foo is a value class, you 
might get an existing instance.

In bytecode, the translation of 'new Foo()' depends on the kind of class (as 
determined at compile time). Identity class creation continues to be 
implemented via 'new Foo; dup; invokespecial Foo.()V'. Value class 
creation occurs via 'invokestatic Foo.()LFoo;' (method name 
bikeshedding tk). There is no compatibility between the two (e.g., if an 
identity class becomes a value class).

In a way, it shouldn't be surprising that a value class doesn't guarantee 
unique instances, because uniqueness is closely tied to identity. So 
special-casing 'new Foo()' isn't that different from special-casing 
Object.equals'—in the absence of identity, we'll do something reasonable, but 
not quite the same.

Factories don't enter into this story at all. If we end up having unnamed 
factories in the future, they will be declared and invoked with a separate 
syntax, and will be declarable both by identity classes and value classes. 
(Value class factories don't seem particularly compelling, but they could, say, 
be used to smooth migration, like 'Integer.valueOf'.)

Biggest concerns: for now, it can be surprising that 'new' doesn't always give 
you a unique instance. In a future with factories, navigating between the 'new' 
syntax and the factory invocation syntax may be burdensome, with style wars 
about which approach is better.

(2) 'new Foo()' as a general-purpose creation tool

In this approach, 'new Foo()' is the use-site syntax for *both* factory and 
constructor invocation. Factories and constructors live in the same overload 
resolution "namespace", and all will be considered by the use site.

In bytecode, the preferred translation of 'new Foo()' is 'invokestatic 
Foo.()LFoo;'. Note that this is the case for both value classes *and 
identity classes*. For compatibility, 'new/dup/' also needs to be 
supported for now; eventually, it might be deprecated. Refactoring between 
constructors and factories is generally compatible.

Because this re-interpretation of 'new Foo()' supports factories, there is no 
unique instance invariant. At best, particular classes can document that they 
produce unique instances, and clients who need this behavior should ensure 
they're working with classes that promise it. (It's not as simple as looking 
for a *current* factory, because constructors can be refactored to factories.)

For developers who don't care about unique instances, this is the simplest 
approach: whenever you want an instance of Foo, you say 'new Foo()'.

Biggest concerns: we've demoted an ironclad semantic guarantee to an optional 
property of some classes. For those developers/use cases who care about the 
unique instance invariant, that may be difficult, especially because we're 
undoing a longstanding property rather than designing it this way from the 
beginning.

(3) 'new Foo()' for unique instances and just 'Foo()' otherwise

Here, the 'new' keyword is reserved for cases in which a unique instance is 
guaranteed. For value class creation, factory invocation, and constructor 
invocation when unique instances don't matter, a bare 'Foo()' call is used 
instead. 'new Point()' would be an error—this syntax doesn't work with value 
classes.

In bytecode, 'new Foo()' always compiles to 'new/dup/', while plain 
'Foo()' typically compiles to 'invokestatic Foo.()LFoo;' (method name 
bikeshedding tk). For compatibility, plain 'Foo()' would support 
'new/dup/' invocations as well, if that's all the class provides. 
Refactoring between constructor

Re: Abstract class with fields implementing ValueObject

2022-02-14 Thread Dan Smith

> On Feb 14, 2022, at 7:23 AM, Frederic Parain  
> wrote:
> 
> 
> On 2/13/22 1:05 PM, Dan Smith wrote:
>>> On Feb 12, 2022, at 10:16 PM, Srikanth Adayapalam 
>>>  wrote:
>>> 
>>> I understand Frederic is asking about whether the spec inadvertently 
>>> allows something it should not - Here anyway is javac behavior:
>>> 
>>> Given:
>>> 
>>> abstract class A implements ValueObject {
>>> int x;
>>> }
>>> 
>>> on compile:
>>> X.java:1: error: The type A attempts to implement the mutually incompatible 
>>> interfaces ValueObject and IdentityObject
>>> abstract class A implements ValueObject {
>>>  ^
>>> 1 error
>> Yep, this is expected and consistent: javac sees the field and infers the 
>> superinterface IdentityObject (per the language rules), then detects the 
>> conflict between interfaces.
>> A slightly more interesting variation: declare a simple interface Foo; 
>> change to 'A implements Foo'. This compiles fine, inferring A implements 
>> IdentityObject. Then separately compile Foo so that it extends ValueObject. 
>> No compilation error, but the JVM should detect the 
>> IdentityObject/ValueObject conflict when A is loaded.
>> To generate the kind of class files Fred asked about, you'd need to use 
>> something other than javac.
> 
> 
> That's not really the point. The JVM cannot rely on what javac generates
> or not, because it has to deal with other class files generators.
> We have to agree on the behavior of the VM based on what is possible in
> the class file, because, at the end, this is what must be implemented.
> 
> The fact that such a class file is useless to the user is almost
> secondary. We just need to know if the VM should accept such class file.
> As long as it doesn't break VM invariants, we are fine accepting it.

Yes, I agree of course. This is an area of the proposed spec where there's an 
acknowledged uncertainty about what the rules should say, and we'll need to 
make a final choice and make sure implementations are conforming to that choice.

To resolve the uncertainty, though, we've got to have a design conversation 
about user expectations, potential impact, etc., and in that context it's 
relevant how often the class files in question are likely to be produced, how 
useful the class files in question might be, and so on.

To review where we're at on JVMS:

1) A class file with ACC_PERMITS_VALUE set may implement, or not, ValueObject 
(directly or indirectly)

2) A class file without ACC_PERMITS_VALUE set, but that declares an  
method, implicitly implements IdentityObject; if it also implements ValueObject 
(directly or indirectly), that's an error

3) For a class file without ACC_PERMITS_VALUE set, and without an  method:

3a) Current spec draft says it doesn't implement anything implicitly, and 
so can implement whatever it wants explicitly

3b) An alternative spec approach would say it implicitly implements 
IdentityObject, and so it's an error to also implement ValueObject

My points about "javac doesn't generate it" and "there's precedent for 
uninstantiable class files" are to argue that (3a) is not a bug. So we can 
leave it to other factors to decide whether (3a) or (3b) is the right approach.

Re: Abstract class with fields implementing ValueObject

2022-02-13 Thread Dan Smith

> On Feb 12, 2022, at 10:16 PM, Srikanth Adayapalam 
>  wrote:
> 
> I understand Frederic is asking about whether the spec inadvertently allows 
> something it should not - Here anyway is javac behavior:
> 
> Given:
> 
> abstract class A implements ValueObject {
> int x;
> }
> 
> on compile:
> X.java:1: error: The type A attempts to implement the mutually incompatible 
> interfaces ValueObject and IdentityObject
> abstract class A implements ValueObject {
>  ^
> 1 error

Yep, this is expected and consistent: javac sees the field and infers the 
superinterface IdentityObject (per the language rules), then detects the 
conflict between interfaces.

A slightly more interesting variation: declare a simple interface Foo; change 
to 'A implements Foo'. This compiles fine, inferring A implements 
IdentityObject. Then separately compile Foo so that it extends ValueObject. No 
compilation error, but the JVM should detect the IdentityObject/ValueObject 
conflict when A is loaded.

To generate the kind of class files Fred asked about, you'd need to use 
something other than javac.

Re: Abstract class with fields implementing ValueObject

2022-02-11 Thread Dan Smith

> On Feb 9, 2022, at 2:50 PM, Frederic Parain  
> wrote:
> 
> There's a weird case that seems to be allowed by the Value Objects JVMS draft:
> 
> An abstract class can declare non-static fields, which means it won't
> have the ACC_PERMITS_VALUE flag set, but also declare that it implements
> the ValueObject interface.
> 
> The combination looks just wrong, because no class can subclass such class:
>  - identity classes are not allowed because of the presence  of
>the ValueObject interface
>  - value classes are not allowed because of the absence of
>ACC_PERMITS_VALUE
> 
> I've looked for a rule that would prohibit such combination in the
> JVMS draft but couldn't find one.
> 
> Did I miss something?

If it doesn't have ACC_PERMITS_VALUE set and it declares an  method, the 
class implicitly implements IdentityObject (see 5.3.5). And then there's an 
immediate error, because of the IdentityObject/ValueObject clash.

If it doesn't have ACC_PERMITS_VALUE set and it *doesn't* declare an  
method, it's impossible to instantiate. Then there's a technical question of 
whether an error occurs, but it's not really an interesting use case for 
programmers (and javac would never generate this).

(Relevant discussion on this corner case from the spec changes draft:

An abstract class implements IdentityObject if it declares an instance 
initialization method and does not have its ACC_PERMITS_VALUE flag set; and 
implements ValueObject if the opposite is true (ACC_PERMITS_VALUE, no instance 
initialization method). Instance initialization methods and ACC_PERMITS_VALUE 
represent two channels for subclass instance creation, and this analysis 
determines whether only one channel is "open".

Alternatively, we could ignore instance initialization methods and rely 
entirely on ACC_PERMITS_VALUE. In practice, abstract classes written in the 
Java programming language always have instance initialization methods, so the 
difference in behavior is only relevant to classes produced via other languages 
or tools.)

Re: EG meeting, 2022-02-09 [SoV-3: constructor questions]

2022-02-11 Thread Dan Smith

I need to do more work and have something concrete to propose before engaging 
too deeply in this discussion, but:

> On Feb 9, 2022, at 11:32 AM, John Rose  wrote:
> 
> Regarding reflection, I think it would be OK to surface all of the  
> methods (of whatever signature) on the getConstructors list, even if they 
> return “something odd”.

The thing about Constructors (and class instance creation expressions, for that 
matter, along with a lot of other -based tooling that this new feature 
might want to transparently slide into) is that they have a type determined by 
the class to which they belong. There's no 'getReturnType' in the Constructor 
API.

EG meeting, 2022-02-09

2022-02-08 Thread Dan Smith

EG Zoom meeting tomorrow at 5pm UTC (9am PDT, 12pm EDT).

Some mail discussions we can follow up on:

"VM model and aconst_init": we had a side conversation about potentially 
requiring explicit conversions from Q to L

"Interface pollution and JVM invariants": Dan pointed out that the interfaces 
IdentityObject and ValueObject are not strongly enforced at runtime, because 
the verifier doesn't check interface types

"SoV-3: constructor questions": Dan asked about validation for  and  
methods. Answer: JVM doesn't care about  methods in abstract classes, the 
rules about  methods still uncertain.

"SoV-2: weak references": Dan raised concerns about the weak reference strategy 
described in SoV.

"The interfaces IdentityObject and ValueObject must die !": Self-explanatory 
:-). We retraced how we got here and talked about alternatives.

"JEP update: Classes for the Basic Primitives": I revised JEP 402 to catch up 
to changes to the model elsewhere

"Terminology bikeshedding summary": I listed areas where we've had some concern 
about the terminology currently being used

---

To provide some structure, I think we can put the topics in three categories:

Known unfinished issues:
- Factory method interpretation/restrictions
- WeakReference handling

Revisiting/questioning the current design:
- Explicit Q->L conversions
- IdentityObject and ValueObject interfaces
- Terminology

Probably resolved, but reviewing:
- Abstract supers and construction signals/mechanisms
- Declarations for basic primitives

Re: Interface pollution and JVM invariants

2022-01-28 Thread Dan Smith

On Jan 28, 2022, at 2:28 PM, Dan Heidinga 
mailto:heidi...@redhat.com>> wrote:


public WeakReference(T o) {
 if (o.getClass().isValue()) throw IAE;
 referent = o;
}

That kind of check is easy to miss (or assume isn't required) based on
a straightforward reading of the source code.

I like the IO/VO interfaces as they let us put the constraints "must
be identity (or not)" in the type system but having them as interfaces
means the guarantees aren't strictly enforced by the runtime.

A useful observation, thanks! We should be careful not to fall into the trap of 
thinking 'IdentityObject' as a type guarantees that you're not operating on a 
value object, where such a guarantee would be important.

Even if you recognize the problem, it can be hard to address it in source. This 
won't catch it (replace "Runnable" with whatever interface you care about):

void test(Runnable r) {
this.r = (Runnable) r; // javac ignores "redundant" cast
}

This will:

void test(Runnable r) {
if (r instanceof Runnable) {
this.r = r;
} else {
throw new ClassCastException();
}
}

Re: VM model and aconst_init

2022-01-28 Thread Dan Smith

I'll chime in that I am coming around to disjoint Q/L, and here are a couple of
thoughts on how to reconcile that with VM generics.

On Jan 27, 2022, at 2:00 PM, John Rose
mailto:john.r.r...@oracle.com>> wrote:

Furthermore, i believe that subtyping is a key to avoid multiple bytecode
verification of the generics code.

I recommend a far simpler technique: Just don’t. Don’t multiple-verify, and and
don’t do anything that would need such extra verification (with different
types) to give the same answer (as the first verification). Also, don’t think
of a JVM-loaded set of bytecodes as every having more than one operational
meaning.

(I don’t understand this obsession, not just with you, of imagining generic
bytecodes as statically “recopied” or “recapitulated” or “stamped out” for each
generic instance.

I think it's an important aspect of parametric polymorphism that properties
that are true in the generic code continue to be true for specific
instantiations. (It's complicated, I'm not making a formal rule here, but
there's an intuition there that I think is right.)

So the key for me is: if the generic code says you're going to aload a
Ljava/lang/Object and then store it with putfield, it should be true that,
*modulo optimizations*, all instantiations will load an Object and store it in
an Object field.

That doesn't mean we're literally verifying it for each instantiation, but it's
how I expect code authors to understand what they are saying.

By example, with the TypeRestriction attribute [1], the restriction has to be
subtype of the declared type/descriptor.

No, just no. (Not your fault; I suppose the TR docs are in a confusing state.)
There’s no reason to make that restriction (linking TRs to the verifier type
system), other than the futile obsession I called out above. And if you do you
make yourself sorry when you remember about the verifier’s I/J/F/D types. And
for many other reasons, such as instance types which are inaccessible to the
generic code. And the simple fact that bytecodes are not really
generic-capable, directly, without serious (and futile) redesign.

Making bytecodes directly generic (in the sense of contextually re-type-able)
is never going to be as simple as using a TR mechanism which occurs apart from
the verifier.

A “type system” for TRs is pretty meaningless, from the JVM POV. In the JVM, a
TR (despite its name) is most fundamentally an operational specification of how
to project (G->I) or embed (I->G) instance types into generic types. It’s a
function pair. It can connect I=int to G=Object, which the JVM knows are
disjoint types.

I do think it's important that type restrictions are polymorphic: if there's a
type restriction on my above Object, it should *both* be true that the value is
an Object, and the value has whatever property the type restriction claims. A
type restriction can't make the value no longer an Object.

But I think we can hold on to this property and still support disjoint Q/L.
How? By not allowing type restrictions to literally claim that values have Q
types. Instead, they claim that a value with some L type is *freely convertible
to* a particular Q type. (This may be the same thing as John saying the type
restriction involves projections and embeddings, although I'm not sure I would
make it the type restriction's responsibility to encapsulate those conversions.)

So, for example: a type restriction that we might spell as 'QPoint' (and maybe
that notation is a mistake) is an assertion that a particular L-typed variable
always stores non-null objects for which 'instanceof Point' is true. *But
they're still objects*, as far as the abstract JVM is concerned. Then the JVM
implementation is free to recognize that it can use the same encoding it uses
for the actual type 'QPoint' to store things in the variable.

There are a couple places where reality intrudes on this simple model:
- The initial value of a field/array with a type restriction is determined by
that type restriction (because, e.g., 'null' can't satisfy the 'QPoint'
restriction)
- Type restrictions may introduce tearing risks, which would have to be
explained by specifying the possibility that a JVM implementation may use type
restrictions to optimize storage of value object instances of primitive
classes, encoding them as primitive values

I'm left feeling somewhat uneasy that we end up with a world in which
directly-typed code has Q types, while specialized generic code has as its "types"—two different ways to explain
the exact same thing, in some sense duplicating efforts—but I think we can live
with that. On the other hand, it's a nice win that the language runtime model
is more closely aligned with the JVM's runtime model (where value objects and
primitive values are two distinct things).

Re: The interfaces IdentityObject and ValueObject must die !

2022-01-27 Thread Dan Smith

On Jan 26, 2022, at 6:14 PM, John Rose 
mailto:john.r.r...@oracle.com>> wrote:


On 26 Jan 2022, at 16:55, Dan Smith wrote:

On Jan 26, 2022, at 4:55 PM, John Rose 
mailto:john.r.r...@oracle.com>> wrote:

Independently of that, for the specific case of Object

, having a query function Class.instanceKind

, which returns “NONE” for abstracts else “VALUE” or “IDENTITY”, would encode 
the same information we are looking at with those marker interfaces.

Right, so you're envisioning a move in which, rather than 'obj instanceof 
ValueObject', the dynamic test is 'obj.getClass().instanceKind() == VALUE'.

...

'Object.class.instanceKind()' must return NONE, just as Object.class must not 
implement either IdentityObject or ValueObject.

That last “must” is necessarily true, but the second-to-last “must” is not 
necessarily true. That’s my point here.

Okay, I understand—it's possible for library code to do whatever arbitrary 
things it wants, while 'instanceof' has specific, fixed behavior.

But... I don't really see how clients of this method would be comfortable with 
'Object.class.instanceKind()' and 'Runnable.class.instanceKind()' returning 
different things. They've both got to be 'NONE', it seems to me. What does 
'NONE' mean, if not "instances of this Class (and its subclasses) can be both 
value classes and identity classes"?

(I guess we could have two methods, one of which is called 
'directInstanceKind'. But how likely would users be to use the right one, 
depending on which question they were trying to ask? And wouldn't those users 
smart enough to ask the right question be okay just testing for Object.class as 
a special case?)

Re: SoV-3: constructor questions

2022-01-27 Thread Dan Smith

> On Jan 27, 2022, at 8:09 AM, Dan Heidinga  wrote:
> 
>>> 2) What is the rationale behind the return type restrictions on  
>>> methods?
> 
>> Treatment of  methods is still unresolved, so this (and the JEP) is 
>> just describing one possible approach. I tried to reach a conclusion on this 
>> a few months ago on this list, but we ended in an unresolved place. I'll try 
>> again...
>> 
>> Anyway, in this incarnation: the rule is that the return type must be a type 
>> that includes instances of the current class. So, in class Point, QPoint is 
>> okay, LObject is okay, but LString is not.
> 
> I don't understand the point of this restriction.  Since
> Ljava/lang/Object; is acceptable (and has to be), I can use a ``
> method to return *any* class but the caller will need to downcast to
> use it.

I think the reason we might have some sort of restriction is if we intend for a 
language or reflection API to be able to rely on these methods having some 
consistent properties (imagine them being surfaced with 
java.lang.reflect.Constructor, for example). So think of the restriction as a 
placeholder ("we may have some sort of restriction on the return type, TBD"). 
We still need to do some work to figure out the precise requirements, if any.

Re: The interfaces IdentityObject and ValueObject must die !

2022-01-26 Thread Dan Smith

On Jan 26, 2022, at 4:55 PM, John Rose 
mailto:john.r.r...@oracle.com>> wrote:

Independently of that, for the specific case of Object, having a query function 
Class.instanceKind, which returns “NONE” for abstracts else “VALUE” or 
“IDENTITY”, would encode the same information we are looking at with those 
marker interfaces.

Right, so you're envisioning a move in which, rather than 'obj instanceof 
ValueObject', the dynamic test is 'obj.getClass().instanceKind() == VALUE'.

For dynamic testing (my #3), sure, these are equivalent.


But the contract for a method is more flexible than the contract of a marker 
interface.

In particular, instanceKind is not required to report the same thing for T and 
U when T<:U but marker interfaces are forced to be consistent across T<:U. I 
think this is an advantage, precisely because it has more flexible structure, 
for the method rather than the marker interface.

I would expect that 'cls.instanceKind() == IDENTITY' has the exact same 
semantics as 'IdentityObject.class.isAssignableFrom(cls)': if a class claims to 
be an identity class, then all its instances (direct and via subclassing) are 
identity objects. I'm not seeing a sensible alternative.

How does this behave in the odd world in which direct instances of Object are 
identity objects?

'Object.class.instanceKind()' must return NONE, just as Object.class must not 
implement either IdentityObject or ValueObject.

Given one of these oddball objects, 'obj.getClass().instanceKind()' will, 
naturally, return NONE. Which is surprising, breaking the expectation that 
there are only two possible results of this expression. Just as these objects 
would break the expectation that every object is either 'instanceof 
IdentityObject' or 'instanceof ValueObject'.

I keep saying this: how we handle the 'new Object()' problem doesn't seem to me 
to have any impact on how we encode "I'm an identity class". It's not a 
discussion that belongs in this email thread.


If the marker interfaces also have little use as textual types (e.g., for 
bounds and method parameters) then I agree with Remi. Ditch ‘em.

I outlined many ways in which we're making use of these interfaces. Static 
types is just one. Getting rid of them isn't as easy as "ditch 'em", it would 
involve redesigning for all of those use cases (plus any I forgot), and coming 
up with something that is compellingly better. (This is an invitation, for 
anyone interested in proposing something specific...)

Re: The interfaces IdentityObject and ValueObject must die !

2022-01-26 Thread Dan Smith



On Jan 26, 2022, at 4:36 PM, fo...@univ-mlv.fr wrote:

But this isn't a property of the *class*, it's a property of the *type*, as 
used at a particular use site. If you want to know whether an array is 
flattened, the class of the component can't tell you.

The semantics B1/B2/B3 is a property of the class, an instance of a value class 
stored in an Object is still an instance of a value class, same for primtive 
classes.

Being flattenable or not, is an orthogonal concern. Being flattenable for a 
field or a local variable/parameter it's a property of the descriptor, but for 
arrays it's a property of the array class because arrays are covariant in Java.

For arrays, a code like this should work
  Object[] o = new QComplex;[3];
  o[1] = QComplex;.new(...);

My point is that if you're holding an Object, there is nothing interesting 
about instances of primitive classes to distinguish them from instances of 
value classes. They are both value objects, with value object semantics.

Now, if you're holding an Object[] and you want to know whether the array is 
flat or not, that could be a useful property to detect. But a query like the 
following won't tell you:

PrimitiveObject.class.isAssignableFrom(arr.getClass().getComponentType());

Instead, the question you want to ask is something like:

arr.getClass().getComponentType().isPrimitive();

You're claiming that the IdentityObject vs. ValueObject and value class vs. 
primitive class distinctions should be treated the same, but what I'm 
illustrating here is that they are different kinds of properties. The first is 
a property of a class (or of all instances of a class), the second of a type 
(or a container with a type).

I don't understand why you want Point.ref/LPoint; to be a PrimitiveObject and 
not a ValueObject.
We have QPoint;[] <: LPoint;[] so LPoint; implements ValueObject and QPoint; 
implements PrimitiveObject.

Classes implement interfaces. Types have subtyping relationships *derived from* 
classes. The mechanism of encoding a property with an interface thus involves 
attaching that interface to a particular class.

So if you want QPoint <: PrimitiveObject, it must be the case that *class* 
Point implements PrimitiveObject. And then it will be the case also that LPoint 
<: PrimitiveObject.

(I don't want any of this. What I want is for there to be just one interface, 
ValueObject, that indicates identity-free semantics for a class/object, and I 
want class Point to implement that interface.)

Meanwhile, I'd suggest writing the method like this, using universal generics:

  public void m(T[] array, int index) {
   array[index] = null;  // null warning
 }

If you are Okay with code that can raise a NPE, why are you not Okay with code 
that can raise an IllegalMonitorStateException ?
Or said differently why ValueObject and not PrimitiveObject.

You've lost me here.

I showed an example that uses universal generics. The type variable T ranges 
over both primitive and reference types, so may not be nullable. It is improper 
to write a null into a T[]. As a concession to compatibility, we want to treat 
this as a warning rather than an error, because plenty of code like this exists 
today.

Similarly, there may be code that ranges over both value objects and identity 
objects. An easy way to do this is with a variable of type Object. It is 
improper to synchronize on an Object that may be a value object. Again, as a 
concession to compatibility, we probably want to treat this as a warning.

public void m(Object obj) {
synchronized (obj) { // warning
...
}
}

These are both situations where being able to detect properties of the input 
could be useful to avoid runtime errors, but that doesn't mean the mechanism 
should be the same for both. (And I explained above why testing for 
PrimitiveObject[] wouldn't do what you want it to do.)

Object can't be an identity class, at compile time or run time, because some 
subclasses of Object are value classes.
Object the type is not an identity class, but Object the class (the Object in 
"new Object()") is an identity class.

You're envisioning a different model for types and classes than the one we have.

An instance of a class is also an instance of (and carries the properties of) 
its superclasses. Value objects are instances of the class Object.

I can imagine a design in which we say that instances of Object may be either 
identity or value objects, but direct instances of the class are always 
identity objects. But this is not how we've handled the property anywhere else, 
and it breaks some invariants. We've gotten where we are because it seemed less 
disruptive to introduce a subclass of Object that can behave like a normal 
identity class.

(I think we can explore this more, though, and note that I didn't say anything 
about interfaces above. It's orthogonal.)

Re: SoV-3: constructor questions

2022-01-26 Thread Dan Smith

> On Jan 26, 2022, at 2:18 PM, Dan Heidinga  wrote:
> 
> After re-reading the State of Valhalla part 3 again [1], I have a
> couple of questions on constructor handling:
> 
> 1) The rules for handling ACC_PERMITS_VALUE are different between
> SoV-2 and SoV-3 in that the language imposes constraints the VM
> doesn't check.  Is this deliberate?
> 
> SoV-2 says:
>> The classes they can extend are restricted: Object or abstract classes with 
>> no fields, empty no-arg constructor bodies, no other constructors, no 
>> instance initializers, no synchronized methods, and whose superclasses all 
>> meet this same set of conditions. (Number is an example of such a class.)
> 
> while SoV-3 says:
>> Perhaps surprisingly, the JVM does not perform the following checks for 
>> value-superclass candidates, although the source language compiler will 
>> typically do so:
>> 
>> It should have declared an empty no-argument constructor. (Or if it didn’t, 
>> then the author has somehow consented to having all of the constructors 
>> being skipped by the unnamed factory methods of value subclasses.)
> 
> "Perhaps surprisingly" is right as I'm surprised =)  and not sure I
> follow why the VM wouldn't enforce the restriction.  Is it to avoid
> having to specify the attributes of that constructor?

We can come up with a rule to specify, in the Java language, what an "empty 
constructor" looks like. (Nothing syntactically in the body, for example.)

It's harder for the JVM to specify what an "empty  method" looks like. It 
must at least have an 'invokespecial' of its superclass's . Might do some 
stacking of arguments. Etc. JVMs don't want to be in the business of 
recognizing code patterns.

So the model for the JVM is: you can declare  methods if you want to 
support identity subclass creation, and you can declare ACC_PERMITS_VALUE if 
you want to support value subclass creation. Independent channels, no need for 
them to interact.

> Which leads me to the next concern: how javac will compile the "empty
> no-arg constructor bodies" required by SoV-2?  Or is the answer we
> don't care because the VM won't check anyway?

The Java language will produce class files for qualifying abstract classes with:
- ACC_PERMITS_VALUE set
- The same  methods it would have produced in previous versions 
(involving a super invokespecial call)

For a non-qualifying abstract class, you'll get
- ACC_PERMITS_VALUE *not* set
- The same  methods it would have produced in previous versions 
(potentially involving arbitrary user code)

And, yes, the JVM doesn't care. Other patterns are possible in legal class 
files (but javac won't produce them).

> 2) What is the rationale behind the return type restrictions on  methods?
> 
>> A  method must return the type of its declaring class, or a supertype.
> 
>> While  methods must always be static and must return a type consistent 
>> with their class, they can (unlike  methods) be declared in any class 
>> file, as far as the JVM is concerned.
> 
> If I'm reading this correctly, to enforce the first quote we'll need a
> verifier check to ensure that the declared return type of the 
> method is consistent with the current class or a supertype.  But
> Object is a common supertype, as is ValueObject, so I'm not sure what
> we're gaining with this restriction as any type is a valid candidate
> for return from a  method as anything can be a subclass of
> Object.

Treatment of  methods is still unresolved, so this (and the JEP) is just 
describing one possible approach. I tried to reach a conclusion on this a few 
months ago on this list, but we ended in an unresolved place. I'll try again...

Anyway, in this incarnation: the rule is that the return type must be a type 
that includes instances of the current class. So, in class Point, QPoint is 
okay, LObject is okay, but LString is not.

> We get a better restriction from the `aconst_init` and `withfield`
> bytecodes which "can only be executed within the nest of the class
> that declares the value class being initialized or modified".  Do we
> actually need the restriction on the  method or should it be
> considered non-normative (aka a best practice)?

I think there are certainly use cases for class instantiation outside of a 
method named '' (even if javac won't generate them), and wouldn't want to 
limit those instructions to methods named ''. It gives '' more power 
than I think we intend—it's supposed to be a convenient place to put stuff, not 
a mandatory feature of instance creation.

Re: The interfaces IdentityObject and ValueObject must die !

2022-01-26 Thread Dan Smith

On Jan 26, 2022, at 2:18 AM, fo...@univ-mlv.fr wrote:

In other words: I don't see a use case for distinguishing between primitive and
value classes with different interfaces.

Primitive classes does not allow nulls and are tearable, following your logic, 
there should be a subclass of ValueObject named PrimitiveObject that reflects 
that semantics.

But this isn't a property of the *class*, it's a property of the *type*, as 
used at a particular use site. If you want to know whether an array is 
flattened, the class of the component can't tell you.

This is especially useful when you have an array of PrimitiveObject, you know 
that a storing null in an array of PrimitiveObject will always generate a NPE 
at runtime and that you may have to use either the volatile semantics or a lock 
when you read/write values from/to the array of PrimitiveObject.

For examples,
 public void m(PrimitiveObject[] array, int index) {
   array[index] = null;  // can be a compile time error
 }

If we said

class Point implements PrimitiveObject

then it would be the case that

Point.ref[] <: PrimitiveObject[]

and so PrimitiveObject[] wouldn't mean what you want it to mean.

We could make a special rule that says primitive types are subtypes of a 
special interface, even though their class does not implement that interface. 
But that doesn't really work, either—primitive types are monomorphic. If you've 
got a variable with an interface type, you've got a reference.

We could also make a special rule that says arrays of primitive types implement 
an interface PrimitiveArray. More generally, we've considered enhancements to 
arrays where there are different implementations provided by different classes. 
That seems plausible, but it's orthogonal to the IdentityObject/ValueObject 
feature.

Meanwhile, I'd suggest writing the method like this, using universal generics:

  public void m(T[] array, int index) {
   array[index] = null;  // null warning
 }

An impossible type, it's a type that can be declared but no class will ever 
match.

Examples of impossible types, at declaration site
 interface I extends ValueObject {}
 interface J extends IdentityObject {}
  void foo() { }

It would definitely be illegal to declare a class that extends I and J. Our 
rules about well-formedness for bounds have always been sketchy, but 
potentially that would be a malformed type variable.

Abandoning the property entirely would be a bigger deal.

If we do not use interfaces, the runtime class of java.lang.Object can be 
Object, being an identity class or not is a just a bit in the reified class, 
not a compile time property, there is contamination by inheritance.

Object can't be an identity class, at compile time or run time, because some 
subclasses of Object are value classes.

What you'd need is a property of individual *objects*, not represented at all 
with the class. Theoretically possible, but like I said, a pretty big 
disruption to our current model.

For me, it's like opening the door of your house to an elephant because it has 
a nice hat and saying you will fix that with scotch-tape each time it touches 
something.

Ha. This sounds like maybe there's a French idiom involved, but anyway we 
should try to get John to add this to his repertoire of analogies.

EG meeting canceled, 2022-01-26

2022-01-25 Thread Dan Smith

I've got a conflict tomorrow; let's cancel the EG meeting.

There are a few ongoing email discussions, in particular about the State of 
Valhalla documents. If it's still useful, we can pick up those discussions next 
time.

Re: The interfaces IdentityObject and ValueObject must die !

2022-01-25 Thread Dan Smith

> On Jan 25, 2022, at 2:39 PM, Remi Forax  wrote:
> 
> I think we should revisit the idea of having the interfaces 
> IdentityObject/ValueObject.
> 
> They serve two purposes
> 1/ documentation: explain the difference between an identity class and a 
> value class

And, in particular, describe the differences in runtime behavior.

> 2/ type restriction: can be used as type or bound of type parameter for 
> algorithms that only works with identity class
> 
> Sadly, our design as evolved but not those interfaces, they do not work well 
> as type restriction, because
> the type is lost once an interface/j.l.Object is used and the cost of their 
> introduction is higher than previously though.

Not sure there's a problem here. For example, if it's important to constrain 
type parameters to be identity classes, but there's already/also a need for an 
interface bound, nothing wrong with saying:

If the *use site* may have lost track of the types it would need to satisfy the 
bound, then, sure, better to relax the bound. So, not useful in some use cases, 
useful in others.

Other purposes for these interfaces:

3/ dynamic tagging: support a runtime test to distinguish between value objects 
and identity objects

4/ subclass restriction: allow authors of abstract classes and interfaces to 
restrict implementations to only value classes or identity classes

5/ identity-dependent abstract classes: implicitly identifying, as part of an 
abstract class's API, that the class requires/assumes identity subclasses

> 1/ documentation
> 
> - Those interface split the possible types in two groups
>  but the spec split the types in 3, B1/B2/B3, thus they are not aligned 
> anymore with the new design.

The identity class vs. value class distinction affects runtime semantics of 
class instances. Whether the class is declared as a value class or a primitive 
class, the instances get ValueObject semantics.

The primitive type vs. reference type distinction is a property of *variables* 
(and other type uses); the runtime semantics of class instances don't change 
between the two. Being a primitive class is statically interesting, because it 
means you can use it as a primitive type, but at runtime it's not really 
important.

In other words: I don't see a use case for distinguishing between primitive and 
value classes with different interfaces.

> - they will be noise in the future, for Valhalla, the separation between 
> identity object and value object
>  may be important but from a POV of someone learning/discovering the language 
> it's a corner case
>  (like primitives are). This is even more true with the introduction of B2, 
> you can use B2 for a long time without knowing what a value type is. So 
> having such interfaces front and center is too much.

I think it's notable that you get two different equality semantics—something 
you really ought to be aware of when working with a class. But it's a 
subjective call about how prominent that information should be.

> 2/ as type
> 
> - Being a value class or a primitive class is a runtime behavior not a 
> compile time behavior,
>  so representing them with special types (a type is a compile time construct) 
> will always be an approximation.
>  As a consequence, you can have an identity class (resp value class) typed 
> with a type which is not a subtype
>  of IdentityObject (resp ValueObject).
> 
>  This discrepancy is hard to grasp for beginners (in fact not only for 
> beginners) and make IdentityObject/ValueObject
>  useless because if a method takes an IdentityObject as parameter but the 
> type is an interface, users will have
>  to cast it into an IdentityObject which defeat the purpose of having such 
> interface.
> 
>  (This is the reason why ObjectOutputStream.writeObject() takes an Object as 
> parameter and not Serializable)

It is sometimes possible/useful to statically identify this runtime property. 
At other times, it's not. That's not an argument for never representing the 
property with static types.

And even if you never pay attention to the static type, it turns out that the 
dynamic tagging and inheritance capabilities of interfaces are useful features 
for explaining/validating runtime behavior.

> And the cost of introduction is high
> 
> - they are not source backward compatible
>  class A {}
>  class B {}
>  var list = List.of(new A(), new B());
>  List list2 = list;

How about List?

Yes, it's possible to disrupt inference, and touching *every class in 
existence* has the potential to be really disruptive. But we should validate 
that with some real-world code.

> - they are not binary backward compatible
>  new Object().getClass() != Object.class

This has nothing to do with the interfaces. This is based on the more general 
property that every class instance must belong to a class that is either an 
identity class or a value class. Interfaces are just the way we've chosen to 
encode that property. Abandoning the property entirely would be a

JEP update: Classes for the Basic Primitives

2022-01-12 Thread Dan Smith

I'm made some revisions to JEP 402 to better track with the revised JEP 401—in 
particular backing off of "everything is an object".

There were some short-lived changes where I more aggressively pursued the idea 
that the class was named 'int', but I've backed off of that, too. Now taking 
the more conservative approach that there is a primitive class named 'Integer', 
but by special rule its primitive type is expressed 'int', and the class name 
refers to the reference type.

I realize that much of this could potentially change—reflection story, 
terminology, etc. But consider this the "plan of record" for now.

https://bugs.openjdk.java.net/browse/JDK-8259731

Terminology bikeshedding summary

2022-01-12 Thread Dan Smith

EG meeting touched on a few different renaming ideas to consider. Wanted to put 
those all down in one place.

Personally, I don't have strong feelings about any of these. I think the status 
quo is reasonable, and these changes would probably be fine, too (modulo coming 
up with the right word in some cases).

Some of this faces an inherent conflict between different views/models/users, 
and we won't be able to please everybody; the choice is about which of those 
views/models/users we want to emphasize.

#1:

value class vs. [something else] class

I don't recall a concrete proposal, but this reflects some discomfort with the 
idea that "value" is being used differently than when we talk about "the value 
of a variable" or, from some other languages, a "value type".

#2:

primitive class vs. bare value class vs. [something else] value class
primitive type vs. bare value type vs. [something else] value type
primitive value vs. bare value vs. [something else] value

Three objections to "primitive":

- Asking developers to loosen their notion of "primitive" to include 
user-defined, composite types may be too much of a conceptual shift/abuse of 
terminology

- The original 8 primitives still have a number of properties that make them 
special, perhaps deserving a convenient, one-word name (on the other hand, 
*not* giving them a distinct name helpfully communicates that these differences 
should not be considered significant...)

- It's useful to highlight that these things are still a kind of value class, 
in some sense reducing the conceptual overhead (there are identity classes and 
value classes, everything else is secondary)

#3:

primitive value vs. object

We're trying to make a distinction between primitive values being "class 
instances" and calling them "objects", but for many developers, especially 
beginners, that sounds like meaningless pedantry. We might be over-rotating on 
the subtle differences that make these entities distinct, rather than 
acknowledging that, with their fields and methods, they will be commonly 
understood to be a kind of object.

EG meeting, 2022-01-12

2022-01-12 Thread Dan Smith

EG Zoom meeting today at 5pm UTC (9am PDT, 12pm EDT).

We've had a flurry of activity in the last month. The list is below; probably 
best for those involved to decide whether they feel like the topic is settled, 
or deserves further discussion.

"JEP update: Primitive Classes": revisions to JEP 401 to complement the new 
Value Objects JEP

"Enhancing java.lang.constant for Valhalla": discussing whether/how the Preload 
attribute is manifested in this API

"We have to talk about 'primitive'": terminology discussion regarding the word 
"primitive"

"JEP update: Value Objects": followup discussion about inference/checking/usage 
of the IdentityObject & ValueObject interfaces

"Updated State of Valhalla documents": revisions to the big-picture-oriented 
State of Valhalla documents to reflect recent design changes

"Why do we need .ref class for primitive class?": clearing up how '.ref' types 
are encoded in bytecode

"L-type, Q-type and class mirrors": discussing how '.ref' types are modeled by 
reflection

"VM model and aconst_init": discussion how 'aconst_init' interacts with 'null' 
and verification types

Re: Updated State of Valhalla documents

2022-01-05 Thread Dan Smith

> On Jan 5, 2022, at 5:14 PM, fo...@univ-mlv.fr wrote:
> 
>> But: it's really important to understand that, in the proposed model, 
>> primitive
>> values, identity objects, and value objects *all* belong to classes.
> 
> yes,
> for the VM, a lambda, a record or an enum are all classes, even if in the 
> syntax the keyword "class" is not used.

Not talking about the VM. I'm talking about the language model.

> A primitive (B3) does not provide proper encapsulation unlike a classical 
> Java class (the one spelt "class" in the language),

You should say "object" here, not "class". Primitive values have classes, even 
though they are not objects.

Re: Updated State of Valhalla documents

2022-01-05 Thread Dan Smith

> On Dec 23, 2021, at 12:58 PM, fo...@univ-mlv.fr wrote:
> 
> But for Java, i would argue that the model is more
> we have either reference objects or primitives, for reference objects you 
> have those with identity and those without identity,
> hence "primitive" being a top-level kind while "value" (or a better term) 
> being a modifier. 

I don't want to get too in the weeds on syntax (even though, yes, it does help 
convey the underlying model!). The change you propose is, indeed, a possibility 
that is still on the table.

But: it's really important to understand that, in the proposed model, primitive 
values, identity objects, and value objects *all* belong to classes. That's 
where they get their members, via the normal rules about class membership. A 
primitive value is an instance of a class, even though it is not an object.

Whether we use the syntax 'c l a s s' in the primitive declaration, the thing 
being declared is a class. Just like the thing being declared with 'e n u m' is 
a class.

Re: JEP update: Value Objects

2021-12-21 Thread Dan Smith

> On Dec 20, 2021, at 10:54 AM, Brian Goetz  wrote:
> 
> 
> So the choices for VM infer&inject seem to be:
> 
>  - Only inject IO for legacy concrete classes, based on classfile version, 
> otherwise require everything to be explicit;
>  - Inject IO for concrete classes when ACC_VALUE is not present, require VO 
> to be explicit;
>  - Inject IO for concrete classes when ACC_VALUE is not present; inject VO 
> for concrete classes when ACC_VALUE is present
> 

One more dimension to this is whether "inject" and "require" are talking about 
an element in the `interfaces` array of the declaration, or simply the presence 
of the interface via some combination of inheritance/declaration.

The latter seems more natural. But in "require" cases, it leads to surprising 
binary incompatibilities (per some comments I made earlier in the thread):

1) declare `interface Foo extends ValueObject` and `value class Bar extends Foo`

2) compile; javac excludes ValueObject from Bar's `interfaces`

3) Modify Foo, removing `extends ValueObject` (turns out I was overly eager 
when I put in that constraint, and I actually wouldn't mind subclasses that are 
identity classes)

4) recompile Foo separately, which succeeds

5) Try running, and discover that class Bar refuses to load, with an error 
saying it doesn't implement ValueObject ("of course it does!" you say—"it's a 
value class")

Inference is nice in that it will happily paper over these sorts of separate 
compilation mismatches.

JEP update: Primitive Classes

2021-12-16 Thread Dan Smith

First, I've made some minor revisions to the Value Objects JEP in the last 
couple of weeks. You can see it here:
https://openjdk.java.net/jeps/8277163

Second, I've put together a draft of a revised JEP 401, Primitive Classes. This 
removes content that became part of the Value Objects feature, and refines how 
we talk about the relationship between primitive types and reference types. 
Working outside of JBS for now, because I don't want to disrupt the 
already-Candidate JEP 401 artifact until we're at least ready to Submit the 
Value Objects piece.

A key idea is that primitive values and value objects are distinct entities, 
with different types, but they're both instances of the same class (thanks for 
the good ideas here, Kevin!).

(I'll acknowledge the ongoing discussion about whether "primitive" is the right 
term to use here. But for now, sticking with the status quo.)

Happy to hear your thoughts!

---

Summary
---

Support new, developer-declared primitive types in Java. This is a
[preview language and VM feature](http://openjdk.java.net/jeps/12).



Goals
-

This JEP introduces primitive classes, special kinds of
[value classes][jep-values] that define new primitive types.

The Java programming language will be enhanced to recognize primitive class
declarations and support new primitive types in its type system.

The Java Virtual Machine will be enhanced with a new `Q` carrier type to encode
declared primitive types.



Non-Goals
-

This JEP is concerned with the core treatment of developer-declared primitives.
Additional features to improve integration with the Java programming language
are not covered here, but are expected to be developed in parallel.
Specifically:

-   [JEP 402][jep402] will enhance the basic primitives (`int`, `boolean`, etc.)
by giving them primitive class declarations.

-   [A separate JEP][jep-generics] will update Java's generics so that primitive
types can be used as type arguments.

Other followup efforts may enhance existing APIs to take advantage of primitive
classes, or introduce new language features and APIs built on top of primitive
classes.



Motivation
--

Java developers work with two kinds of values: primitives and objects.

Primitives offer better performance, because they are typically *inlined*—stored
directly (without headers or pointers) in variables, on the computation stack,
and, ultimately, in CPU registers. Hence, memory reads do not have additional
indirections, primitive arrays are stored densely and contiguously in memory,
primitive-typed fields can be similarly compact, primitive values do not require
garbage collection, and primitive operations are performed within the CPU.

Objects offer better abstractions, including fields, methods, constructors,
access control, and nominal subtyping. But objects traditionally perform poorly
in comparison to primitives, because they are primarily stored in heap-allocated
memory and accessed by reference.

*Value objects*, introduced by [another JEP][jep-values], significantly improve
object performance in many contexts, providing a good fusion of the better
abstractions of objects with the better performance of primitives.

However, certain invariant properties of objects limit how much they can be
optimized—particularly when stored in fields and arrays. Specifically:

-   A variable of a reference type may be `null`, so the inlined layout of a
value object typically requires some additional bits to encode `null`.
For example, a variable storing an `int` can fit in 32 bits, but for a value
class with a single `int` field, a variable of that class type could
use up to 64 bits.

-   A variable of a reference type must be modified atomically. This often makes
it impractical to inline a value object, because its layout would be too
large for efficient atomic modification. Large primitive types (currently,
`double` and `long`) make no such atomicity guarantees, so variables of
these types can be modified efficiently without indirect representations
(concurrency is instead managed at a higher level).

Primitive classes give developers the capability to define new primitive types
that aren't subject to these limitations. Programs can make use of class
features without giving up any of the performance benefits of primitives.

Applications of developer-declared primitives include:

-   Numbers of varieties not supported by the basic primitives, such as
unsigned bytes, 128-bit integers, and half-precision floats;

-   Points, complex numbers, colors, vectors, and other multi-dimensional
numerics;

-   Numbers with units—sizes, rates of change, currency, etc.;

-   Bitmasks and other compressed encodings of data;

-   Map entries and other data structure internals;

-   Data-carrying tuples and multiple returns;

-   Aggregations of other primitive types, potentially multiple layers deep



Description
---

The features described

Re: We have to talk about "primitive".

2021-12-15 Thread Dan Smith

On Dec 15, 2021, at 12:17 PM, Brian Goetz 
mailto:brian.go...@oracle.com>> wrote:

The main problem I think we can't escape is that we'll still need some word 
that means only the eight predefined types. (For the sake of argument let's 
assume we can pick one and lean hard on it, whether that's "predefined", 
"built-in", "elemental", "leaf type", or whatever.)

I've been calling them the built-in primitives; we've test-driven other terms 
like "basic" primitives.  Assume we'll agree on a term.  Also, no matter how we 
try, they will be different from the extended primitives in some ways, such as:

 - Their reference companions have weird names (e.g., Integer);
 - They permit a seemingly circular declaration (i.e., the declaration of 
"class int" will use "int" in its representation);
 - They will be translated differently, because the VM has built-in carriers 
for I/J/F/D, whereas extended primitives will use the L and Q carriers;
 - There will probably be some special treatment in reflection for these eight 
types;

Most of these are things about which we can say "OK, fine, these are historical 
warts."

There may be others asymmetries too, that derive from compatibility 
constraints.  As you say, the game is minimization.

Yes, this is a good list. Add to it:
- They are named with a lower-case keyword
- They exclusively get to use special operators (for now)

My high-level response to "primitive=one of 8 types" is that it may be giving 
the good name to, and drawing attention to, something that doesn't matter much. 
Sure, we'll need to specify a distinction for the purpose of the things on the 
list, but I don't think most programmers should really care whether the value 
they're working with belongs to one of the 8 special types or not.

These especially don't matter:
- Aliased reference type names: going forward, everybody should be saying 
`int.ref` instead
- Circular declarations: less than 100 people in the world need to care about 
this (maybe exaggerating)
- Weird JVM features: yes, but the JVM has lots of quirks, ergonomics are not 
the top priority

And the operator limitation is not fundamental, certainly could be addressed in 
the future.

So we're left with, for most Java programmers, a set of special types that get 
spelled with keywords and get some special behavior in the reflection API. My 
initial sense is that's not enough to put them in their own different-noun 
category.

Meanwhile, if we can tell programmers "primitives have members/classes now, and 
libraries can define additional primitives", that can build on existing 
intuitions pretty well. For example, the primitive type/reference type duality 
still exists, and pretty much works the same. Asking them to do s/primitive 
type/value type/ in this context is its own Indiana Jones maneuver.

EG meeting, 2021-12-15

2021-12-15 Thread Dan Smith

EG Zoom meeting today at 5pm UTC (9am PDT, 12pm EDT).

Possible topics:

"JEP update: Value Objects": discussed some details about inferred 
superinterfaces, including impact on JVMTI

"basic conceptual model": Kevin shared his notes describing key Java 
programming model concepts, in anticipation of changes coming from primitive 
classes

"Enhancing java.lang.constant for Valhalla": Brian explored evolution of 
java.lang.constant

Re: JEP update: Value Objects

2021-12-02 Thread Dan Smith

> On Dec 2, 2021, at 1:04 PM, Dan Heidinga  wrote:
> 
> On Thu, Dec 2, 2021 at 10:05 AM Dan Smith  wrote:
>> 
>> On Dec 2, 2021, at 7:08 AM, Dan Heidinga  wrote:
>> 
>> When converting back from our internal form to a classfile for the
>> JVMTI RetransformClasses agents, I need to either filter the interface
>> out if we injected it or not if it was already there.  JVMTI's
>> GetImplementedInterfaces call has a similar issue with being
>> consistent - and that's really the same issue as reflection.
>> 
>> There's a lot of small places that can easily become inconsistent -
>> and therefore a lot of places that need to be checked - to hide
>> injected interfaces.  The easiest solution to that is to avoid
>> injecting interfaces in cases where javac can do it for us so the VM
>> has a consistent view.
>> 
>> 
>> I think you may be envisioning extra complexity that isn't needed here. The 
>> plan of record is that we *won't* hide injected interfaces.
> 
> +1.  I'm 100% on board with this approach.  It cleans up a lot of the
> potential corner cases.
> 
>> Our hope is that the implicit/explicit distinction is meaningless—that 
>> turning implicit into explicit via JVMTI would be a 100% equivalent change. 
>> I don't know JVMTI well, so I'm not sure if there's some reason to think 
>> that wouldn't be acceptable...
> 
> JVMTI's "GetImplementedInterfaces" spec will need some adaptation as
> it currently states "Return the direct super-interfaces of this class.
> For a class, this function returns the interfaces declared in its
> implements clause."
> 
> The ClassFileLoadHook (CFLH) runs either with the original bytecodes
> as passed to the VM (the first time) or with "morally equivalent"
> bytecodes recreated by the VM from its internal classfile formats.
> The first time through the process the agent may see a value class
> that doesn't have the VO interface directly listed while after a call
> to {retransform,redefine}Classes, the VO interface may be directly
> listed.  The same issues apply to the IO interface with legacy
> classfiles so with some minor spec updates, we can paper over that.
> 
> Those are the only two places: GetImplementedInterfaces & CFLH and
> related redefine/retransform functions, I can find in the JVMTI spec
> that would be affected.  Some minor spec updates should be able to
> address both to ensure an inconsistency in the observed behaviour is
> treated as valid.

Useful details, thanks.

Would it be a problem if the ClassFileLoadHook gives different answers 
depending on the timing of the request (derived from original bytecodes vs. 
JVM-internal data)? If we need consistent answers, it may be that the "original 
bytecode" approach needs to reproduce the JVM's inference logic. If it's okay 
for the answers to change, there's less work to do.

To highlight your last point: we *will* need to work this out for inferred 
IdentityObject, whether we decide to infer ValueObject or not.

Re: JEP update: Value Objects

2021-12-02 Thread Dan Smith

On Dec 2, 2021, at 7:08 AM, Dan Heidinga 
mailto:heidi...@redhat.com>> wrote:

When converting back from our internal form to a classfile for the
JVMTI RetransformClasses agents, I need to either filter the interface
out if we injected it or not if it was already there.  JVMTI's
GetImplementedInterfaces call has a similar issue with being
consistent - and that's really the same issue as reflection.

There's a lot of small places that can easily become inconsistent -
and therefore a lot of places that need to be checked - to hide
injected interfaces.  The easiest solution to that is to avoid
injecting interfaces in cases where javac can do it for us so the VM
has a consistent view.

I think you may be envisioning extra complexity that isn't needed here. The 
plan of record is that we *won't* hide injected interfaces. Our hope is that 
the implicit/explicit distinction is meaningless—that turning implicit into 
explicit via JVMTI would be a 100% equivalent change. I don't know JVMTI well, 
so I'm not sure if there's some reason to think that wouldn't be acceptable...

Re: JEP update: Value Objects

2021-12-01 Thread Dan Smith

> On Dec 1, 2021, at 4:56 PM, John Rose  wrote:
> 
> On Dec 1, 2021, at 3:29 PM, Dan Smith  wrote:
>> 
>> So we went down the path of "maybe there's no need for a flag at all" in 
>> today's meeting, and it might be worth more consideration, but I convinced 
>> myself that the ACC_VALUE flag serves a useful purpose for validation and 
>> clarifying intent that can't be reproduced by a "directly/indirectly extends 
>> ValueObject" test.
>> 
>> As you suggest, though, we could mandate that ACC_VALUE implies 'implements 
>> ValueObject’.
> 
> Assuming ACC_VALUE is part of the design, there are actually four
> things we can specify, for the case when a class file has ACC_VALUE set:
> 
> A. Inject ValueObject as a direct interface, whether or not it was already 
> inherited.
> B. Inject ValueObject as a direct interface, if  it is not already inherited.
> C. Require ValueObject to be present as a direct interface, whether or not it 
> was already inherited.
> D. Require ValueObject to be present as an interface, either direct or 
> inherited.

I realize my last sentence there is ambiguous, so thanks for spelling these 
out. I meant that Dan has suggested (D), and we could consider doing so. (The 
JEP says do either A or B, it's vague about what "considered to implement" 
means.)

> A and B will look magic to reflection.

This I'm unclear on. What's the magic? Are you imagining that certain 
superinterfaces be suppressed by reflection. As I said, our intent is to *not* 
suppress anything.

> B is slightly more parsimonious and less predictable than A.

Yeah, I'm not sure what I prefer. The distinction only matters, I think, for 
reflection.

> C and D are less magic to reflection, and require a bit more “ceremony” in 
> the class file.
> D is less ceremony than C.
> Also, the D condition is a normal subtype condition, while the C condition is 
> unusual to the JVM.

The "normal subtype condition" is a big reason to prefer D over C.

> I guess I prefer C and D over A and B because of the reflection magic problem,
> and also because of Dan H’s issue (IIUC) about “where do we look for the
> metadata, if not in somebody’s constant pool?”

I'll reiterate this point:

>> the trouble with gating off less-preferred behavior in old versions is that 
>> it's still there and still must be supported. JVMs end up with two 
>> strategies instead of one. A (great strategy+ok strategy) combination is 
>> arguably *worse* than just (ok strategy) everywhere.

We haven't really eliminated these problems if we're still inferring 
IdentityObject elsewhere. We've just (slightly) reduced their footprint. At the 
expense of living with two strategies instead of one.

> Since D and C have about equal practical effect, and D is both simpler to
> specify and less ceremony, I prefer D best of all.

I'm concerned about D's separate compilation problem: implementing ValueObject 
at compile time doesn't guarantee implementing ValueObject at runtime. That 
change is not, strictly speaking, a binary compatible change, but a 
superinterface author might think they could get away with it, and the 
resulting error message seems excessively punitive: "you can't load this class 
because some superinterface changed its mind about allowing identity class 
implementations". They wanted to allow more, and ended up allowing less.

Which means, to be safe, the compiler should always redundantly implement 
ValueObject in value classes, but then a compiler might forget to do so and 
introduce a subtle bug, ...

Tolerable, but it's a rough edge of D.

Re: JEP update: Value Objects

2021-12-01 Thread Dan Smith

> On Dec 1, 2021, at 9:32 AM, Remi Forax  wrote:
> 
> Hi Daniel,
> this is really nice.
> 
> Here are my remarks.
> 
> "It generally requires that an object's data be located at a fixed memory 
> location"
> remove "fixed", all OpenJDK GCs move objects.
> Again later, remove "fixed" in "That is, a value object does not have a fixed 
> memory address ...".

Yeah, was hoping I could weasel my way out of that with "generally", but okay. 
Changed to "particular memory location".

> At the beginning of the section "Value class declarations", before the 
> example, i think we also need a sentence saying that fields are implicitly 
> final.

Eh, this is putting more detail in the introductory paragraph than I want. I 
think I'm happier going the other direction—putting the rules about 'final' and 
'abstract' class modifiers in the "subject to the following restrictions" list 
after the example. Then the intro is just two sentences about the 'value' 
keyword.

> Class file and representation, about ACC_PERMITS_VALUE, what's the difference 
> between "permits" and "allow" in English ?

Very close synonyms, I'd say? I would use them interchangeably.

The reason I chose "permits" is because we already have a PermittedSubclasses 
attribute that serves a similar purpose.

> In section "Java language compilation",
> "Each class file generated by javac includes a Preload attribute naming any 
> value class that appears in one of the class file's field or method 
> descriptors."
> + if a value class is the receiver of a method call/field access (the 
> receiver is not part of the method descriptor in the bytecode).

The need here is to identity inlinable classes at the declaration site. Use 
sites don't need it. (And the the type of 'this' at the declaration site is, of 
course, already loaded.)

> In section "Performance model"
> "... must ensure that fields and arrays storing value objects are updated 
> atomically.",
> not only stores, loads has to be done atomically too.

"read and written atomically", then.

> The part "Initially, developers can expect the following from the HotSpot 
> JVM" is dangerous because it will be read as Hotspot will do that forever.
> We have to be more vague here, "a Java VM may ..."

Yes, message received. I'll ask around about the best way to document our 
intentions for the targeted release (perhaps outside the JEP) without 
suggesting a constraint on the abstract feature.

Re: [External] : Re: JEP update: Value Objects

2021-12-01 Thread Dan Smith

> On Dec 1, 2021, at 8:48 AM, Dan Heidinga  wrote:
> 
>> class file representation & interpretation
>> 
>> A value class is declared in a class file using the ACC_VALUE modifier 
>> (0x0100). At class load time, the class is considered to implement the 
>> interface ValueObject; an error occurs if a value class is not final, has a 
>> non-final instance field, or implements—directly or 
>> indirectly—IdentityObject.
> 
> I'll reiterate my earlier pleas to have javac explicitly make them
> implement ValueObject.  The VM can then check that they have both the
> bit and the interface.

So we went down the path of "maybe there's no need for a flag at all" in 
today's meeting, and it might be worth more consideration, but I convinced 
myself that the ACC_VALUE flag serves a useful purpose for validation and 
clarifying intent that can't be reproduced by a "directly/indirectly extends 
ValueObject" test.

As you suggest, though, we could mandate that ACC_VALUE implies 'implements 
ValueObject'. Some reasons not to require this:

- 'implements ValueObject' may be redundant if an ancestor implements 
ValueObject; but leaving it off risks a separate compilation error (e.g., 
ancestor used to implement ValueObject, doesn't anymore). So I think the proper 
compilation strategy would be to always implement it directly, even 
redundantly. There's an opportunity for a subtle compiler bug.

- It's extra ceremony in the class file. 

- Inferring is consistent with what we do for at least some identity classes. 
Inferring everywhere is, in some ways, simpler.*

(*Tangent about the idea of inferring IdentityObject in old versions, but 
requiring IdentityObject in new versions: the trouble with gating off 
less-preferred behavior in old versions is that it's still there and still must 
be supported. JVMs end up with two strategies instead of one. A (great 
strategy+ok strategy) combination is arguably *worse* than just (ok strategy) 
everywhere.)

> It's a simpler model if the interface is
> always there for values as the VM won't have to track whether it was
> injected for a value class or explicitly declared.  Why does that
> matter?  For two reasons: JVMTI will need to be consistent in the
> classfile bytes it returns and not included the interface if it was
> injected (less tracking), and given earlier conversations about
> whether to "hide" the injected interface from Class::getInterfaces,
> always having it for values removes one more sharp edge.

The plan of record is to make no distinction between inferred and explicit 
superinterfaces in reflection. Is that not acceptable for JVMTI? If there's no 
need for a distinction, does that address your concern about inferred supers?

EG meeting, 2021-12-01

2021-12-01 Thread Dan Smith

EG Zoom meeting today at 5pm UTC (9am PDT, 12pm EDT).

We can discuss "JEP update: Value Objects", the use of the term "value" here, 
and class file encodings.

Re: JEP update: Value Objects

2021-11-30 Thread Dan Smith

On Nov 30, 2021, at 12:05 AM, John Rose 
mailto:john.r.r...@oracle.com>> wrote:

Also, I’m not really demanding a title change here, Dan, but rather asking
everyone to be careful about any presupposition that “of course we will
heal the rift by making all primitives be classes”.  Or even “all primitives
be objects.”  Those are easy ideas to fall into by accident, and I don’t want
us to get needlessly muddled about them as we sort them out.

+1

I've been defaulting in descriptions like my two-axis grid to the plan of 
record, until we settle on a revised plan. But quite possible that "class" is 
not the right word for the second row.

(As for JEP 401—it will need to be revised to build on the Value Objects JEP. 
What you're seeing right now is unchanged from a few months ago. An updated 
iteration to come...)

JEP update: Value Objects

2021-11-29 Thread Dan Smith

I've been exploring possible terminology for "Bucket 2" classes, the ones that 
lack identity but require reference type semantics.

Proposal: *value classes*, instances of which are *value objects*

The term "value" is meant to suggest an entity that doesn't rely on mutation, 
uniqueness of instances, or other features that come with identity. A value 
object with certain field values is the same (per ==), now and always, as every 
"other" value object with those field values.

(A value object is *not* necessarily immutable all the way down, because its 
fields can refer to identity objects. If programmers want clean immutable 
semantics, they shouldn't write code (like 'equals') that depends on these 
identity objects' mutable state. But I think the "value" term is still 
reasonable.)

This feels like it may be an intuitive way to talk about identity without 
resorting to something verbose and negative like "non-identity".

If you've been following along all this time, there's potential for confusion: 
a "value class" has little to do with a "primitive value type", as we've used 
the term in JEP 401. We're thinking the latter can just become "primitive 
type", leading to the following two-axis interpretation of the Valhalla 
features:

-
Value class reference type (B2 & B3.ref)| Identity class type (B1)
-
Value class primitive type (B3) |
-

Columns: value class vs. identity class. Rows: reference type vs. primitive 
type. (Avoid "value type", which may not mean what you think it means.)

Fortunately, the renaming exercise is just a problem for those of us who have 
been closely involved in the project. Everybody else will approach this grid 
with fresh eyes.

(Another old term that I am still finding useful, perhaps in a slightly 
different way: "inline", describing any JVM implementation strategy that 
encodes value objects directly as a sequence of field values.)

Here's a new JEP draft that incorporates this terminology and sets us up to 
deliver Bucket 2 classes, potentially as a separate feature from Bucket 3:

https://bugs.openjdk.java.net/browse/JDK-8277163

Much of JEP 401 ends up here; a revised JEP 401 would just talk about primitive 
classes and types as a special kind of of value class.

Re: [External] : Re: EG meeting, 2021-11-17

2021-11-22 Thread Dan Smith

> On Nov 22, 2021, at 2:07 PM, Kevin Bourrillion  wrote:
> 
>> On Mon, Nov 22, 2021 at 6:27 AM Dan Heidinga  wrote:
>> 
>> I'll echo Brian's comment that I'd like to understand Kevin's use
>> cases better to see if there's something we're missing in the design /
>> a major use case that isn't being addressed that will cause useer
>> confusion / pain.
>> 
> Sorry if I threw another wrench here!
> 
> What I'm raising is only the wish that users can reasonably default to 
> B2-over-B1 unless their use case requires something on our list of "only B1 
> does this". And that list can be however long it needs to be, just hopefully 
> no longer. That's probably how we were looking at it already.

Here's the current list, FYI (derived from JEP 401):

• Implicitly final class, cannot be extended.
• All instance fields are implicitly final, so must be assigned exactly 
once by constructors or initializers, and cannot be assigned outside of a 
constructor or initializer.
• The class does not implement—directly or indirectly—IdentityObject. 
This implies that the superclass is either Object or a stateless abstract class.
• No constructor makes a super constructor call. Instance creation will 
occur without executing any superclass initialization code.
• No instance methods are declared synchronized.
• (Possibly) The class does not implement Cloneable or declare a 
clone()method.
• (Possibly) The class does not declare a finalize() method.
• (Possibly) The constructor does not make use of this except to set 
the fields in the constructor body, or perhaps after all fields are definitely 
assigned.

And elaborating on IdentityObject & stateless abstract classes:

An abstract class can be declared to implement either IdentityObject or 
ValueObject; or, if it declares a field, an instance initializer, a non-empty 
constructor, or a synchronized method, it implicitly implements IdentityObject 
(perhaps with a warning). Otherwise, the abstract class extends neither 
interface and can be extended by both kinds of concrete classes.

(Such a "both kinds" abstract class has its ACC_PRIM_SUPER—name to be 
changed—flag set in the class file, along with an  method for identity 
classes.)

EG meeting, 2021-11-17

2021-11-17 Thread Dan Smith

EG Zoom meeting today at 5pm UTC (9am PDT, 12pm EDT).

Lots of traffic this time, we can have follow up discussions wherever there's 
interest. Potential topics:

"Consolidating the user model": followup discussions homed in on how we model 
primitive values—whether they're reference-less objects or some other "value" 
entity, and how they interact with reference types

"Equality operator for identityless classes": Kevin is concerned that the new 
== operator is an attractive nuisance, because it's sometimes, but not always, 
equivalent to 'equals'

"identityless objects and the type hierarchy": discussed how the 
IdentityObject/PrimitiveObject interfaces are used in the "Consolidating the 
user model" world

"Consequences of null for flattenable representations": John described 
strategies for encoding nulls where object references are flattened

Re: [External] : Re: Consolidating the user model

2021-11-04 Thread Dan Smith

On Nov 4, 2021, at 10:08 AM, Kevin Bourrillion 
mailto:kev...@google.com>> wrote:

Your model is likely enough the best, and I'm simply "resisting" it, but in 
that case I'm channeling some of the resistance other users will feel, and we 
can hash out how to head it off. But also, occasionally I turn out to be right 
about things so I'll prepare for that misfortune as well.

Keep it up. It's a very useful exercise, and I haven't ruled out that you're 
onto something valuable here.

Re: [External] : Re: Consolidating the user model

2021-11-04 Thread Dan Smith

On Nov 3, 2021, at 7:34 PM, John Rose 
mailto:john.r.r...@oracle.com>> wrote:

There’s a bigger hiccup when you compare all that with good old int:

int iv = 42;  // “class int” is NOT a thing, but “class Integer” is
assert iv.getClass() != int.class;  // because int is not a class
assert iv.getClass() == Integer.class;  // ah, there’s the class!
assert iv.getClass() == int.ref.class;  // this works differently from Point
assert ((Object)iv).getClass() == pr.getClass();  // this should be true also, 
right?

And to finish out the combinations:

int.ref ir = iv;  // same object… now it’s on the heap, though, with a real 
live heap header
assert ir.getClass() == Integer.class;  // same class
assert ir.getClass() == int.ref.class;  // and this time it’s a ref-class (only 
for classic primitives)
assert ir.getClass() != int.class;

All this has some odd irregularities when you compare what Point does and what 
int does.  And yet it’s probably the least-bad thing we can do.

A bad response would be to follow the bad precedent of ir.getClass() == 
Integer.class off the cliff, and have pv.getClass() and pr.getClass() return 
Point.ref.class.  That way, getClass() only returns a ref.  Get it, see, 
getClass() can only return reference types.  The rejoinder (which Brian made to 
me when I aired it) is devastating:  Point.class is the class, not 
Point.ref.class, and the method is named “get-class”.

I guess to rephrase this, I'll just say: yes, there are problems with 
int/Integer. But we shouldn't let that tail wag the dog when sorting out the 
language model. int/Integer is going to be a special case, no matter how we 
stack it. (On the other hand, we really like to look for analogies from 
int/Integer when sorting out the language model, and sometimes those are 
fruitful. But handle with care.)

Re: [External] : Re: Consolidating the user model

2021-11-04 Thread Dan Smith

On Nov 3, 2021, at 6:19 PM, Kevin Bourrillion 
mailto:kev...@google.com>> wrote:

I think my intuitions about boxes tie heavily to 'getClass' behavior (or some 
analogous reflective operation). "What are you?" should give me different 
answers for a bare value and a box. A duck in a box is not the same thing as a 
duck.

The analogy here would be that Integer.getClass() returns Integer.class, while 
int.getClass(), if it existed, would return int.class.

So far so good. If `int.getClass()` has to work at all, it might as well 
produce `int.class`, though it serves no actual purpose and we would just 
refactor it to `int.class` anyway. If `int.getClass()` won't even compile, it 
would be no great loss at all. The method exists for finding the dynamic type 
of an object; my model says "values are not objects and so have no dynamic 
type", which I think is good.

But Point extends Object, and Object.getClass exists.

One thing the user model has to explain is how method inheritance works. You've 
been pointing out that inheritance != subtyping, which is true. But still, when 
I invoke a super method (a default method in a superinterface, say), it must be 
true that that method declaration knows how to execute on a value.

The ref/val model explains this by saying that method invocation will 
add/remove references to align with the expecations of the 
(dynamically-selected) method implementation. The object remains the same, so 
'this' is the object that the caller started with.

I guess the value/object model would pretty much say the same thing, except it 
would say the value the caller started with might be boxed (or the object 
unboxed) to match the method's expectations. It's the same *value*, presented 
as an object.

Either way, if I can invoke 'getClass', its behavior is specified by the 
*class* not the value/object, so I would expect to get the same answer whether 
invoked via a value or a reference/box.

(Another thing you could say is that the super method is like a template, 
stamped out in specialized form for each primitive subclass as part of 
inheritance. We experimented with this way of thinking for awhile before 
deciding, no, it really needs to be the case that invoking an inherited method 
means executing the method body in its original context.)

Now, all that said, we could say by fiat that `getClass` is special and value 
types aren't allowed to invoke it. YAGNI. Except...

I might want to write code like:

 void m(T arg) {
if (arg.getClass() == Point.class) System.out.println("I'm a value!");
else System.out.println("I'm a box!");
}

Someone might think this, but they can just ask themselves whether 
`int/Integer` work like that. They don't, so this doesn't either.

int/Integer are a starting point, but our goal is to offer something more.

In particular, we want universal generics: when I invoke m and pass it a Point, 
it must be the case that T=Point, not T=Point.ref. This is different than the 
status quo for int/Integer, where T=Integer.

The right way to interpret generic code is, roughly, to substitute [T:=Point] 
and figure out what the code would do. This is imprecise, because there are 
compile-time decisions that aren't allowed to change under different 
substitutions. (For example, we don't re-do overload resolution for different 
Ts, even if it would get different answers.) But, for our purposes, it should 
be the case that you can imagine 'arg' being a value, not a reference, and this 
code having intuitive behavior.

So the ref/val model says that 'arg' is an object (handled by value, not by 
reference) and its 'getClass' method returns the class of the object.

The value/object model says that 'arg' is a value and its 'getClass' method 
exists. And I guess it returns Point.class.

(If we really thought `getClass` was poison, I guess at this point we could say 
by fiat that type variable types aren't allowed to access `getClass`. But... 
`getClass` really is a useful thing to invoke in this context.)

An implication of universal generics is that there needs to be some common 
protocol that works on both vals and refs. In the val/ref model, that protocol 
is objects: both vals and refs are objects with members that can be accessed 
via '.'. In the value/object model, I'm not quite sure how you'd explain it. 
Maybe there's a third concept here, generalizing how values and objects behave.

Re: [External] : Re: Consolidating the user model

2021-11-03 Thread Dan Smith

On Nov 3, 2021, at 3:40 PM, Kevin Bourrillion 
mailto:kev...@google.com>> wrote:

The problem is that you want to say that the Point gets converted to some other 
thing, yet that other thing:
- is == to the original

I would hope that's already true of int==Integer?

Formally, you can't literally compare an int with an Integer. All comparisons 
between a boxed Integer and an int have to decide if they're primitive 
comparisons, reference comparisons, or illegal, based on some rather complex 
conversions and disambiguation rules. At runtime, if the types you use result 
in a reference comparison, the answer depends on quirks of the interning logic.

Informally, whatever path you take, where boxed Integers are involved, == is 
unreliable, because you may indeed be comparing two different objects that 
happen to have been derived from the same number.

Now, if we kept `int` and `Integer` as distinct things, but turned `Integer` 
into an identity-free class, I suppose it's true that you wouldn't be able to 
tell whether two boxes were distinct or not, because == would always be true. 
(More properly, "are these distinct boxes with the same payload?" would be a 
malformed question to ask, because it presumes identity.)

So, okay: to be fair to these reimagined boxes, I'll stipulate that they are 
identity-free, and thus indistinguishable with ==.

- provides the exact same API as the original
- has the exact same behaviors as the original

Agreed that Point and Point.ref are different types that have the same 
members/features.

One-class-multiple-types is not entirely without precedent (though, sure, 
List and List and List don't have exactly the same API).

Once you accept that they're different types, then the fact they have the same 
API is just convenient.


- works exactly like a class declared with original class's declaration

It's the same class. There's only one class.

(There are two java.lang.Classes, because what that type models is not "class", 
it's something more like "an erased type or void" .)

Is your model that, where there are n possible Points, there are in fact 2n 
instances of class Point, where half of them are "values" and half of them are 
"boxes"?

I would find that pretty confusing, but I'm not sure it's what you mean. I 
would want to be able to somehow distinguish which subset an instance belonged 
to.

Or is it your model that, when you convert a value to a box, the two things are 
the same class instance, just manifested or encoded differently?

That's actually not that far from the model we've described, which is that it's 
the same instance, just *viewed* or *accessed* differently. Those are different 
verbs, and so the models might not be interchangeable, but they're close.

If you're telling people that when you assign a Point to type Object, they now 
have something other than a Point, they're going to want to *see* that somehow. 
And of course they can't, because the box is a fiction.

What would they want to see? What is there to see about an object? Maybe its 
header, its dynamic type -- and uh, those things must be there, right?. because 
how could I use it polymorphically otherwise. I'm not sure what else would be 
meant by "seeing" the thing.

I think my intuitions about boxes tie heavily to 'getClass' behavior (or some 
analogous reflective operation). "What are you?" should give me different 
answers for a bare value and a box. A duck in a box is not the same thing as a 
duck.

The analogy here would be that Integer.getClass() returns Integer.class, while 
int.getClass(), if it existed, would return int.class.

I might want to write code like:

 void m(T arg) {
if (arg.getClass() == Point.class) System.out.println("I'm a value!");
else System.out.println("I'm a box!");
}

But this isn't the runtime behavior we would intend to support, because in fact 
at runtime there are no boxes to reflect.

I'll attempt to flip this around on you. :-) You say that a value of type Point 
is also already an "object". But then where is its header, its dynamic type? 
Objects have that. For whatever reason this seemed like the more conspicuous 
leak to me.

The value type/reference type model is that you can operate on an object 
directly, or by reference. It's the same object either way. Reference 
conversion just says "take this object and give me a reference to it". Nothing 
about the object itself changes.

The details of object encoding are deliberately left out of the model, but it's 
perfectly fine for you to imagine a header and a dynamic type carried around 
with the object always, both when accessed as a value and when accessed via a 
reference.

(It is, I suppose, part of the model that objects of a given class all have a 
finite, matching layout when accessed by value, even if the details of that 
layout are kept abstract. Which is why value types are monomorphic and you need 
reference types for polymorphism.)

The fact that the VM often discards object headers at

Re: Consolidating the user model

2021-11-03 Thread Dan Smith

On Nov 3, 2021, at 11:23 AM, Kevin Bourrillion 
mailto:kev...@google.com>> wrote:

On Wed, Nov 3, 2021 at 9:02 AM John Rose 
mailto:john.r.r...@oracle.com>> wrote:

> One way to thicken this thin argument is to say that Point is not really a 
> class.
> It’s a primitive.  Then it still has a value-set inclusion relation to 
> Object, but it’s
> not a sub-class of Object.  It is a value-set subtype.

I would spin it like this: `Point` absolutely is a class. But its instances are 
values (like ints and references are, but compound), and values are still not 
objects.

We've said at times we want to "make everything an object", but I think the 
unification users really care about is everything being a class instance.

I think this fits neatly with the current design: `Point` has no supertypes*, 
not even `Object`, but `Point.ref` does.

(*I mean "supertype" in the polymorphic sense, not the "has a conversion" sense 
or the "can inherit" sense. I don't know what the word is really supposed to 
mean. :-))

These sorts of explanations make me uncomfortable—that a Point stored in a 
reference isn't really a Point anymore, but a "box" or something like that.

The problem is that you want to say that the Point gets converted to some other 
thing, yet that other thing:
- is == to the original
- provides the exact same API as the original
- has the exact same behaviors as the original
- works exactly like a class declared with original class's declaration

If you're telling people that when you assign a Point to type Object, they now 
have something other than a Point, they're going to want to *see* that somehow. 
And of course they can't, because the box is a fiction.

The reference vs. value story that we developed to address these problems (and 
problems that arise when you *do* let people "see" a real box) carries the 
right intuitions: you can handle a Point by value or by reference, but either 
way it's the exact same object, so of course everything you do with it will 
work the same.

Re: Equality operator for identityless classes

2021-11-03 Thread Dan Smith

On Nov 3, 2021, at 9:58 AM, Kevin Bourrillion 
mailto:kev...@google.com>> wrote:

Today, things are pretty okay because developers can learn that `==` is a code 
smell. A responsible code reviewer has to think through each one like this:

1. Look up the type. Is it a builtin, or Class? Okay, we're fine.
2. Is it an enum? Okay, I resent having to go look it up when they could have 
just used switch, but fine.
3. Wait, is this weird code that actually cares about objects instead of what 
they represent? This needs a comment.

The problem is that now we'll be introducing a whole class of ... classes ... 
for which `==` does something reasonable: only the ones that happen to contain 
no references, however deeply nested! These cannot at all be easily 
distinguished. This is giving bugs a really fantastic way to hide.

I'm not sure about this leap: while it's true that `==` is sometimes equivalent 
to `equals`, in general, you can't be sure without deep knowledge about the 
class. As a coding convention, seems reasonable to me to continue to expect 
clients to use `equals` rather than trying to develop a finer-grained 
distinction between different classes. I think it's perfectly fine advice for 
most code to continue to treat `==` as a smell, like they always have.

EG meeting, 2021-11-03

2021-11-03 Thread Dan Smith

EG Zoom meeting today at 4pm UTC (9am PDT, 12pm EDT). Note that we're still on 
DST in the US, won't shift to 5pm UTC until next time.

We'll discuss:

"Consolidating the user model": Brian described a user model centered on 
reference and value types. Sent just yesterday, so we'll probably spend most of 
the time just reviewing the main ideas.

EG meeting canceled, 2021-10-20

2021-10-20 Thread Dan Smith

No new topics today, so we'll cancel the meeting. Next scheduled slot is 
November 3.

EG meeting, 2021-10-06

2021-10-05 Thread Dan Smith

EG Zoom meeting tomorrow, Wednesday October 6, at 4pm UTC (9am PDT, 12pm EDT).

We can discuss "Addressing the full range of use cases", which concerns how we 
support nullability, atomicity, and migration.

Re: Addressing the full range of use cases

2021-10-04 Thread Dan Smith

Here's a followup with some answers reflecting my own understanding and what 
we've learned at Oracle while investigating these ideas. (Presented separately 
because there's still a lot of uncertainty, and because I want to encourage 
focusing on the contents of the original mail, with this reply as a supplement.)

> On Oct 4, 2021, at 5:34 PM, Dan Smith  wrote:
> 
> Some questions to consider for this approach:
> 
> - How do we group features into clusters so that they meet the sweet spot of 
> user expectations and use cases while minimizing complexity? Is two clusters 
> the right number? Is two already too many? (And what do we call them? What 
> keywords best convey the intended intuitions?)

A "classic" and "encapsulated" pair of clusters seems potentially workable 
(better names TBD). Classic primitive classes behave as described in JEP 
401—this piece is pretty stable. (Although some pieces, like the construction 
model, could be refined to better match their less class-like semantics.) 
Encapsulated primitive classes are always nullable and (maybe?) always atomic.

Nullability can be handled in one of two ways:

- Flush the previous mental model that null is inherently a reference concept. 
Null is a part of the value set of both encapsulated primitive value types and 
reference types.

- Encapsulated primitives are *always* reference types. They're just a special 
kind of reference type that can be optimized with flattening; if you want 
finer-grained control, use a classic primitive class. However, we often do the 
exercise of trying to get rid of the ".ref" type, only to find that there are 
still significant uses for a developer-controlled opt out of all flattening...

For migration, encapsulated primitive classes mostly subsume 
"reference-default" classes, and let us drop the 'Foo.val' feature. As nullable 
types, encapsulated primitive value types are source compatible replacements 
for existing reference types, and potentially provide an instant performance 
boost on recompilation. (Still to do, though: binary compatibility. There are 
some strategies we can use that don't require so much attention in the 
language. This is a complex enough topic that it's probably best to set it 
aside for now until the bigger questions are resolved.)

> - If there are knobs within the clusters, what are the right defaults? E.g., 
> should atomicity be opt-in or opt-out?

Fewer knobs are better. Potentially, the "encapsulated"/"classic" choice is the 
only one offered. Nullability and atomicity would come along for the ride, and 
be invisible to users. *However*, performance considerations could push us in a 
different direction.

For the "encapsulated"/"classic" choice, perhaps "encapsulated" should be the 
default. Classic primitives have sharper edges, especially for class authors, 
so perhaps can be pitched as an "advanced" feature, with an extra modifier 
signaling this fact. (Everybody uses 'int', but most people don't need to 
concern themselves with declaring 'int'.)

Alternatively, maybe we'd prefer a term for "classic", and a separate term for 
"encapsulated"? (Along the lines of "record" and "enum" being special kinds of 
classes with a variety of unique features.)

> - What are the performance costs (or, in the other direction, performance 
> gains) associated with each feature? For certain feature combinations, have 
> we canceled out the performance gains over identity classes (and at that 
> point, is that combination even worth supporting?)

Nullability:

Encapsulated primitive class types need *nullable Q types* in the JVM. A 
straightforward way to get there is by adding a boolean flag to the classes. 
This increases footprint in some cases, but is often essentially free. (For 
example: if the size of an array component must be a power of 2, boolean flags 
only increase the array size for 2 or so classes in java.time. Most have some 
free space.)

There are some other strategies JVMs could use to compress null flags into 
existing footprint. In full generality, this could involve cooperation with 
class authors ("this pointer won't be null"). But it seems like that level of 
complexity might be unnecessary—for footprint-sensitive use cases, programmers 
can always fall back to classic primitive classes.

Execution time costs of extra null checks for nullable Q types need to be 
considered and measured, but it seems like they should be tolerable.

Atomicity:

JVM support for atomicity guarantees seems more difficult—algorithms for 
ensuring atomicity above 64 bits tend to be prohibitively expensive. The 
current prototype simply gives up on flattening when atomicity is requested; 
not clear whether that's workabl

Addressing the full range of use cases

2021-10-04 Thread Dan Smith

When we talk about use cases for Valhalla, we've often considered a very broad 
set of class abstractions that represent immutable, identity-free data. JEP 401 
mentions varieties of integers and floats, points, dates and times, tuples, 
records, subarrays, cursors, etc. However, as shorthand this broad set often 
gets reduced to an example like Point or Int128, and these latter examples are 
not necessarily representative of all candidate value types.  

Specifically, our favorite example classes have a property that doesn't 
generalize: they'll happily accept any combination of field values as a valid 
instance. (In fact, they're even happy to accept any combination of *bits* of 
the appropriate length.) Many candidate primitive classes don't have this 
property—the constructors do important validation work, and only certain 
combinations of fields are allowed to represent valid instances.

Related areas of concern that we've had on the radar for awhile:

- The "all zeros is your default value" strategy forces an all-zero instance 
into the class's value set, even if that doesn't make sense for the class. Many 
candidate classes have no reasonable default at all, leading naturally to wish 
for "null is your default value" (or other, more exotic, strategies involving 
revisiting the idea that every type has a default value). We've provided 
'P.ref' for those use sites that *need* null, but haven't provided a complete 
story for value types that want it to be *their* default value, too.

- Non-atomic heap updates can be used to create new instances that arbitrary 
combine previously-validated instances' fields. There is no guarantee that the 
new combination of fields is semantically valid. Again, while there's precedent 
for this with 'double' and 'long' (JLS 17.7), those are special cases that 
don't generalize—any combination of double bit fields is *still a valid 
double*. (This is usually described as "tearing", although JLS 17.6 has 
something else in mind when it uses that word...) The language provides 
'volatile' as a use-site opt-in to atomicity, and we've toyed with a 
declaration-site opt-in as well. But object integrity being "off" by default 
may not be ideal.

- Existing class types like LocalDate are both nullable and atomic. These are 
useful properties to preserve during migration; nullability, in particular, is 
essential for source compatibility. We've provided reference-default 
declarations as a mechanism to make reference types (which have these 
properties) the default, with 'P.val' as an opt-in to value types. But in doing 
so we take away the many benefits of value types by default, and force new code 
to work with the "bad name".

While we can provide enough knobs to accommodate all of these special cases, 
we're left with a complex user model which asks class authors to make n 
different choices they may not immediately grasp the consequences of, and class 
users to keep 2^n different categories straight in their heads.

As an alternative, we've been exploring whether a simpler model is workable. It 
is becoming clear that there are (at least) two clusters of uses for value 
types.  The "classic" value types are like numerics -- they'll happily accept 
any combination of field values as a valid instance, and the zero value is a 
sensible (often the best possible) default value.  They make relatively little 
use of encapsulation.  These are the ones that best "work like an int."  The 
"encapsulated" value types are those that are more like typical aggregates 
("codes like a class") -- their constructors do important validation work, and 
only certain combinations of fields are allowed to represent valid instances.  
These are more likely to not have valid zero values (and hence want to be 
nullable).  

Some questions to consider for this approach:

- How do we group features into clusters so that they meet the sweet spot of 
user expectations and use cases while minimizing complexity? Is two clusters 
the right number? Is two already too many? (And what do we call them? What 
keywords best convey the intended intuitions?)

- If there are knobs within the clusters, what are the right defaults? E.g., 
should atomicity be opt-in or opt-out?

- What are the performance costs (or, in the other direction, performance 
gains) associated with each feature? For certain feature combinations, have we 
canceled out the performance gains over identity classes (and at that point, is 
that combination even worth supporting?)

EG meeting, 2021-09-22

2021-09-21 Thread Dan Smith

Tomorrow's EG Zoom meeting is on! Wednesday at 4pm UTC (9am PDT, 12pm EDT).

Topics to discuss:

"Factory methods & the language model": I raised some questions about how we 
think we should treat JVM factory methods in tools/libraries that present a 
language-level view.

"Project status summary": I summarized where the different pieces of the 
project are at. Would like to know if there are substantial pieces/issues that 
I missed.

Project status summary

2021-09-21 Thread Dan Smith

As I've mentioned, I've been wanting to put together a broad summary of where 
the project is at. I've grouped this into three areas or tracks: Primitive 
Objects, Unified Primitives, and Universal Generics.

--
Primitive Objects

JEP 401: This is the core preview feature, including primitive class 
declarations, primitive object semantics, and primitive value types (with 
reference companions).

- Awaiting finalization of some outstanding design issues before trying to 
target a release

- Working towards an Early Access release, with the goal of substantially 
aligning with the JEP 401 description

- Our design focus recently has been on the "Enforcing instance validation" 
section of the JEP; our best candidate solution is to support a kind of 
primitive class that is both strictly-enforced and nullable. I'll flesh this 
out in a separate email in the next few days.

- There are still some complexities regarding reflection, 'getClass', and 
MethodHandles that we'd like to refine

- The behavior of weak references is still an open question

- JVMS changes are written, with some iteration necessary to fill in gaps and 
respond to feedback

- JLS changes are pending the above instance validation revisions, along with 
some validation of the type system (see discussion in Universal Generics)

---

JEP 8267650: This is a supplementary task focusing on JVMS rules and some 
corner-case JVM behaviors. We'd like to complete it before or at the same time 
as the JEP 401 release.

- JEP is nearly ready for Submission, but I need to iterate on it

- Some initial JVMS changes were created; Alex suggested some significant 
revisions that need to be applied

---

Future work:

- We hope to work on migrating a number of standard library classes (such as 
java.time.*) once JEP 401 is done (probably to be released after the features 
are final)

- Other projects like Amber and Panama hope to take advantage of primitive 
objects as well

--
Unified Primitives

JEP 402: This involves making the wrapper classes primitive and treating 'int', 
'short', etc., as their value types.

- Expect to target a release concurrently with JEP 401

- I don't think we've tried implementing this yet (in javac or the special JVM 
treatment for arrays). It's probably best being handled downstream of the JEP 
401 design issues.

- Some lingering discomfort with the proposed reflection story

- Some vague ideas about pushing this equivalence deeper into the JVM, but no 
concrete proposals

- JVMS changes aren't done, will be pretty small and narrowly-focused

- JLS changes will be fleshed out in parallel with JEP 401

---

JEP TBD: Wrapper Constructor Tooling. JEP 390 provided migration warnings about 
wrapper class constructors in 16+. We need to follow this up with some tooling 
to convert legacy class files so that they'll run on a release that doesn't 
provide Integer., etc., methods.

- Should release before or at the same time as JEP 401.

- Could also integrate other suggested followups to JEP 390, like runtime 
logging of deprecation warnings.

---

Future work:

- There are a lot of opportunities to enhance the API provided by the wrapper 
classes after we've completed the primitive class migration.

--
Universal Generics

JEP 8261529: This is the set of language changes needed to allow generics over 
value types and to facilitate safe migration.

- Has now been Submitted, awaiting Candidate status.

- The type system rules are being developed. High level intuitions are pretty 
straightforward, but the details of type variable types (now in two flavors!) 
and intersection types need some fleshing out and validation, particularly 
since these have historically been neglected.

- JLS changes will come when the type system design is clearer

- A prototype is implemented, subject to specification clarifications

- A near-term goal is to validate the user experience of the proposed 
compilation warnings by addressing them in a subset of standard library code



JEP TBD: Java Type System Refinements. Not clear exactly what this will entail, 
but there is probably a significant chunk of spec work that can be spun off 
independently and address some longstanding issues with the current type system.

---

Future work:

- Applying changes to address warnings in standard libraries (definitely 
java.base, plus some others, maybe not everything, potentially in stages)

- Parametric JVM, as discussed earlier this year—we have a reasonable picture 
of what this will look like, but there are lots of details to work through both 
in the design and prototyping. Type restrictions could be spun off as a 
separate feature, as they may have other use cases.

Re: Factory methods & the language model

2021-09-10 Thread Dan Smith

> On Sep 10, 2021, at 7:56 AM, Dan Heidinga  wrote:
> 
> On Thu, Sep 9, 2021 at 5:32 PM Dan Smith  wrote:
>> 
>> On Sep 9, 2021, at 1:13 PM, Dan Heidinga  wrote:
>> 
>> but to keep the door open to having both factories and
>> constructors in identity classes, should we use a different syntax for
>> factories in primitive classes now?  That way factories would be
>> "spelled" consistently between primitive and identity classes.  Doing
>> so diminishes the "codes like a class" story but leaves the door open
>> for more compatibility in the future.
>> 
>> 
>> Enthusiastic +1.
>> 
>> I don't really *want* to do that, but if we think that's where we're headed, 
>> it is pretty weird that, say, a factory declaration in an Java interface 
>> declaration looks completely different from a factory declaration in a Java 
>> primitive class declaration. Or maybe both styles of declaration are 
>> supported by primitive classes? And does reflection treat them differently, 
>> too? Not sure if this leads anywhere good, but I want to do a bit of 
>> thinking through the implications...
>> 
> 
> Do you want to tackle this on list or wait for the next EG meeting?
> If you have a model / syntax in mind, we can start to work through the
> implications.  Otherwise, we can all pull out the bikeshed paint
> 

Both are fine. :-)

I'm not particularly interested in settling on a bikeshed color, but am 
interested in the general mood for pursuing this direction at all. (And not 
necessarily right away, just—is this a direction we think we'll be going?)

A few observations/questions:

- 'new Foo()' traditionally guarantees fresh instance creation for identity 
classes. Primitive classes relax this, since of course there is no unique 
identity to talk about. Would we be happy with a language that relaxes this 
further, such that 'new Foo()' can return an arbitrary Foo instance (or maybe 
even null)? Or would we want to pursue a different, factory-specific invocation 
syntax? (And if so, should primitive classes use it too?)

- The JVM's factory methods are unnamed, but in practice it's often useful to 
give your factory methods names. Of course the Java language already supports 
*named* factory methods. Does an unnamed factory method feature significantly 
improve the language?

- Identity classes don't have a 'withfield' operation, which means we can't 
mimic the declaration syntax of primitive classes in identity class factories. 
Instead, identity class factories probably look just like normal static 
methods. Can we try to make primitive class factories *also* look like normal 
static methods? (Would require a 'withfield' Java expression. Honestly not sure 
I want to write that code.)

My initial sense: no, trying to generalize like this isn't useful, primitive 
class constructors are best modeled as real constructors, even though they use 
the factory method JVM encoding. It's not particularly likely that an identity 
class factory feature will be worthwhile at all, but if it is, identity and 
primitive classes may use a similar JVM encoding but we shouldn't view these as 
similar language features.

Re: [External] : Re: Factory methods & the language model

2021-09-09 Thread Dan Smith

On Sep 9, 2021, at 1:13 PM, Dan Heidinga 
mailto:heidi...@redhat.com>> wrote:

but to keep the door open to having both factories and
constructors in identity classes, should we use a different syntax for
factories in primitive classes now?  That way factories would be
"spelled" consistently between primitive and identity classes.  Doing
so diminishes the "codes like a class" story but leaves the door open
for more compatibility in the future.

Enthusiastic +1.

I don't really *want* to do that, but if we think that's where we're headed, it 
is pretty weird that, say, a factory declaration in an Java interface 
declaration looks completely different from a factory declaration in a Java 
primitive class declaration. Or maybe both styles of declaration are supported 
by primitive classes? And does reflection treat them differently, too? Not sure 
if this leads anywhere good, but I want to do a bit of thinking through the 
implications...

Re: Factory methods & the language model

2021-09-09 Thread Dan Smith

To clarify a bit that I left out: this discussion assumes a pretty fixed JVM 
feature: a factory method is a static method with a special name, invoked via 
invokestatic, and possibly subject to certain constraints about the 
descriptor/enclosing class. I'm not proposing any changes to that basic 
approach, although choices we make for the Java language & tools _might_ 
influence the set of constraints we choose to impose in JVMS.

> On Sep 9, 2021, at 10:15 AM, Dan Heidinga  wrote:
> 
> On Thu, Sep 9, 2021 at 10:24 AM Dan Smith  wrote:
>> 
>> JEP 401 includes special JVM factory methods, spelled  (or, 
>> alternatively,  with a non-void return), which are needed as a 
>> standardized way to encode the Java language's primitive class constructors.
>> 
>> We have a lot of flexibility in how much we restrict use of these methods. 
>> Too many restrictions seem arbitrary and incoherent from the JVM's point of 
>> view; but too few restrictions risk untested corner cases, unfortunate 
>> compatibility obligations, and difficulties mapping back to the Java 
>> language model.
>> 
>> Expanding on that last one: for tools that operate with a Java language 
>> model, there are essentially three strategies for dealing with factory 
>> methods outside of the core primitive class construction use case:
>> 
>> 1) Have the JVM reject them
> 
> This gives us the maximum flexibility to expand factories in the
> future and let's us concentrate on the inline types use cases.  Seems
> like a pretty safe fallback position on factories.

Yeah. Seems a little... lacking in vision to impose this restriction on class 
files of all languages, but it also avoids over-committing.

> 
>> 2) Ignore them
> 
> I strongly dislike this.  If javac were to ignore them, and just not
> generate them, they are effectively dead code.

Dead to the Java language and tools, but perhaps a useful way to compile a 
Scala feature or something?

>  It's be much clearer
> to users if javac flagged them as such and refused to compile unless
> they were deleted.  If javac ignores them, we still need an answer on
> what the JVM does with them - reject them?  load them but prevent them
> from being invoked?  drop them when loading the classfile?  This seems
> like it collapses back to option 1.

The JVM semantics are clean and wouldn't change: if you want to use a factory, 
invoke it with invokestatic. It's just that the Java language wouldn't provide 
any mechanism to do so (because  or  aren't legal Java method names).

Ignoring does feel a bit like the feature is incomplete or something, but this 
sort of behavior does show up from time to time where Java and the JVM aren't 
perfectly in sync. For example:
- If there are two fields with the same name, one of them is effectively 
invisible
- If there are two methods with the same params and different returns, they're 
considered overloads that are impossible to disambiguate
- If there's a stray  method in an interface (before we outlawed this), 
javac either filters it out or treats it as a normal method, but anyway you 
can't call it because of its name

>> 3) Expand the model to include them
> 
> How much expanding does the model need?  We had originally modeled the
>  factory methods as regular static methods and only gave them the
> specialized name to make them easy to detect, to deal with withfield
> being limited to the nest,  and to allow reflective operations like
> Class::getConstructor() and Class::newInstance() to identify the
> inline type "constructors".  Am I forgetting a case?

Talking here about expanding the *language* model in some way so that factory 
methods appearing in non-primitive classes and interfaces can somehow be 
recognized or invoked. (1) and (2) are reasonable options, too, but here I'm 
exploring other approaches that go beyond rejecting or ignoring.

>> 3) Or we can allow javac to view factory methods in any class as 
>> constructors. A few complications:
>> 
>>- Constructors of non-final classes have both 'new Foo()' and 'super()' 
>> entry points; factories only support the first. So we either need to 
>> validate that a matching pair of  and  exist, or expand the 
>> language to model factories independently from constructors.
> 
> I don't think we want to touch the "new/dup/" sequence and
> trying to allow factories to operate in that delicate dance would be a
> mistake.  Factories, beyond the inline types uses, give us a chance to
> encapsulate the "new/dup/" dance and present a cleaner model.
> We shouldn't attempt to mix the two.

Not sure which direction you're going here?

One

Factory methods & the language model

2021-09-09 Thread Dan Smith

JEP 401 includes special JVM factory methods, spelled  (or, alternatively, 
 with a non-void return), which are needed as a standardized way to 
encode the Java language's primitive class constructors.

We have a lot of flexibility in how much we restrict use of these methods. Too 
many restrictions seem arbitrary and incoherent from the JVM's point of view; 
but too few restrictions risk untested corner cases, unfortunate compatibility 
obligations, and difficulties mapping back to the Java language model.

Expanding on that last one: for tools that operate with a Java language model, 
there are essentially three strategies for dealing with factory methods outside 
of the core primitive class construction use case:

1) Have the JVM reject them
2) Ignore them
3) Expand the model to include them

Taking javac as an example, here's what that looks like:

1) If factory methods outside of primitive classes are illegal, javac can treat 
classes with such methods as malformed and report an error.

2) Or if javac sees a factory method in a non-primitive class, it can just 
leave it out when it maps the class file to a language-level class. (There's 
precedent for this in, e.g., the treatment of fields with the same name and 
different descriptors.)

3) Or we can allow javac to view factory methods in any class as constructors. 
A few complications:

- Constructors of non-final classes have both 'new Foo()' and 'super()' 
entry points; factories only support the first. So we either need to validate 
that a matching pair of  and  exist, or expand the language to model 
factories independently from constructors.

- The language expects instance creation expressions to create fresh 
instances. We need to either validate this behavior (does the factory look like 
"new/dup/"?) or relax the language semantics (perhaps this is in the grey 
area of mixed binaries?)

- Factories can appear in abstract classes and interfaces. Again, are we 
willing to change the language model to support these use cases? Perhaps to 
even allow their declaration?

- If a factory method has a mismatched return type (declared in Foo, but 
returns a Bar), are we willing to support a type system where the type of a 
factory invocation is not the type of the class to which the factory belongs?

There are probably limits to what we're willing to do with (3), which pushes at 
least some cases into the (1) or (2) buckets.

So, my question: what should we expect from (3), now and in the foreseeable 
future? And for the cases that fall outside of it, should we fall back to (1), 
(2), or a mixture of both?

EG meeting canceled, 2021-09-08

2021-09-07 Thread Dan Smith

Well, I was hoping to be in a place to have a good status update conversation 
tomorrow, but the long weekend interfered with those plans. I think we'll be 
best off skipping this meeting once more, and regrouping on Sep 22.

In the mean time, if there's anything you think deserves some attention, feel 
free to send a mail about it!

Re: [External] : Re: Draft JVMS changes for Primitive Objects (JEP 401)

2021-09-07 Thread Dan Smith

> On Aug 11, 2021, at 1:24 PM, Dan Heidinga  wrote:
> 
> And continuing on the "long-overdue" theme, here's my long-overdue
> review of the spec changes.
> 
> A big thank you to you, Dan S., for the careful spec writeup efforts.
> I think this captures our discussions well.
> 
> --Dan
> 
> == Section 2.11.5 Object Creation and Manipulation
>> Create a new class instance: new, withfield.
> Should that also include "defaultvalue"? The semantics aren't quite
> the same because of the structural equality of primitive class types
> but it is conceptually very similar.  And in the instruction section,
> we state "The defaultvalue instruction is similar to the new
> instruction" which lends credence to including it in this list.

Hmm, that is a bit inconsistent. There are two conflicting perspectives:
- It's analogous to 'ldc' or 'aconst_null', putting a well-known constant on 
the stack (so belongs in 2.11.2)
- It's analogous to 'new', putting a fresh instance on the stack (so belongs in 
2.11.5)

I think I lean towards emphasizing the "load a constant" nature of the 
operation, but you're right that it's not a consistent view throughout the 
document.

> == Section 4.1 The ClassFile Structure
> The  `ACC_PRIM_SUPER` flag is introduced and restrictions on classes
> with the flag are called out in various sections such as:
> 4.5 Fields > ACC_PRIM_SUPER flag set, each field must have its
> ACC_STATIC flag set.
> 4.6 Methods > In a primitive class, and in an abstract class that has
> its ACC_PRIM_SUPER flag set, a method that has its ACC_SYNCHRONIZED
> flag set must also have its ACC_STATIC flag set.
> 5.3 Creation and Loading > implements PrimitiveObject if the opposite
> is true (ACC_PRIM_SUPER, no instance initialization method).
> 
> I didn't see static constraints called out to enforce these
> restrictions (should they be?).  Having the handling of the
> ACC_PRIM_SUPER in one place would make the VM's job of validating it
> easier.

Most of Chapter 4 provides the specification for format checking—including 4.5 
and 4.6. Compare the restrictions on fields and methods of interfaces appearing 
in 4.5 and 4.6.

There's a perspective shift here from what you're looking for—rather than 
saying "an ACC_PRIM_SUPER class file is validated as follows: ...", we treat 
ACC_PRIM_SUPER as a statement of fact, and then *other* things in the class 
file are validated with that fact in mind. This avoids forcing everything about 
the new feature into a sidebar, as if it's not part of the "core" JVM.

Anyway, I'd suggest implementing & thinking about validation in the same way 
you implement validation of ACC_INTERFACE-related constraints.

> == 4.6 Methods
>> Design discussion: this section requires that unnamed factory methods (named 
>> ) are static
>> and have a return type that matches their declaring class or interface. By 
>> restricting the
>> descriptor in this way, clients can rely on a predictable, useful return 
>> type.
>> 
>> Alternatively, we could allow a subtype or supertype as the return type, or 
>> impose no constraints
>> at all. One potential use case is a hidden class, which is incapable of 
>> naming its class type in
>> a descriptor.
> 
> Because these are static methods, I thought we had agreed they could
> name any superclass as the return value due to the hidden class
> requirements.  Even though this allows some strange behaviour (ie:
> after some bytecode manipulation) such as the following pseudo-code
> shows:
> ```
> primitive class Strange {
>Strange() { //()Ljava/lang/Object;
>return new String();
>}
> }
> ```
> The contract on `` is more convention than requirement.  In cases
> where the return value needs to be used as a primitive value, it would
> need to go through a checkcast to validate it when a different return
> type is named.
> 
> While this doesn't give the hidden class full powers to be checked in
> the checkcast, it can still be checked against the PrimitiveObject
> interface or its ACC_PRIM_SUPER type.  Seems like a reasonable setup
> and avoids the VM having to check the name matches on the return type
> of the descriptor.

Something we need to investigate more is how factory methods get surfaced in 
reflection. I could imagine clients like reflection really wanting a guarantee 
that when you call Foo., you get a Foo. But it depends on how it is 
presented in the API.

So, still an open question, which is why I listed both alternatives.

Hidden classes did indeed push us to be more permissive, but in retrospect, 
that really shouldn't drive the choice: if you insist on putting a factory-like 
method in your hidden class, but can't follow the JVM's rules, you're fully 
capable of spinning your own static method using whatever name you want.

>> A method of a class or interface named  (2.9.4) must have its 
>> ACC_STATIC flag set.
> 
> Should interfaces be able to implement `` or should we prevent
> that like we do for ``?  Preventing it now gives us the most
> freed

EG meeting canceled, 2021-08-25

2021-08-24 Thread Dan Smith

Eh, still quiet on here. Guess we'll skip one more meeting. I expect we'll have 
a useful status update by the next one. Carry on!

(I owe Dan a reply on JVMS for JEP 401, which I will get to soon.)

EG meeting canceled, 2021-08-11

2021-08-11 Thread Dan Smith

I don't see anything new to discuss, so we'll skip today's meeting. Next 
meeting: August 25.

We're having some good internal discussions about default values & null, and 
will send something out when that settles into something stable.

EG meeting canceled, 2021-07-28

2021-07-25 Thread Dan Smith

I'm on vacation this coming week, and I think we can defer any topics to next 
time. Next meeting: August 11.

Objects.newIdentity update

2021-07-19 Thread Dan Smith

An update on Objects.newIdentity for Java 17: Roger did some work to put the 
feature together and get it reviewed.

https://bugs.openjdk.java.net/browse/JDK-8269096

However, while the implementation is straightforward, for libraries folks not 
deeply familiar with the Valhalla Project, the concept of a method that does 
the same thing as 'new Object()' did not seem particularly justified. I think 
they're especially uncomfortable with the idea of talking about creating an 
"identity" in a world in which all objects have identity.

https://bugs.openjdk.java.net/browse/JDK-8269097

So, not going to work out for this release. We made a bet that it would be a 
simple, noncontroversial matter to slip in an extra method, but it turns out to 
be a tougher sell than we thought.

Of course, as part of JEP 401, a feature like this will have the surrounding 
context, with things like the IdentityObject interface, so that it will make a 
lot more sense. We'll plan for that.

EG meeting, 2021-07-14

2021-07-13 Thread Dan Smith

The next EG Zoom meeting is Wednesday at 4pm UTC (9am PDT, 12pm EDT).

Topics to discuss:

"Draft JVMS changes for Primitive Objects (JEP 401)": I shared a revised JVM 
spec for JEP 401. Some new, tentative ideas here:
- "Inlinable reference type" terminology
- An ACC_PRIM_SUPER flag
- An ACC_ATOMIC flag (formalizing the NonTearable interface)
- Support for L descriptors in CONSTANT_Class
- A JavaFlags attribute for extra language-only flags
- Details for implementing IdentityObject/PrimitiveObject

"Revisiting default values": Kevin weighed in on this old thread with some 
thoughts on nullable primitive value types and default values

"JEP 401 -- reflection and class literals": Brian shared some additional 
thoughts on Class objects and class literals, suggesting we de-emphasize 
type-modeling Class objects

Draft JVMS changes for Primitive Objects (JEP 401)

2021-07-02 Thread Dan Smith

Here's a long-overdue refresh of the proposed JVMS changes to support Primitive 
Objects:

http://cr.openjdk.java.net/~dlsmith/jep401/latest

(Sorry to dump this on the weekend, not looking for same-day feedback. :-))

I *think* I've captured all the key JVMS-related pieces that we expect to 
include with JEP 401, but please let me know if I missed something.

In a number of areas, there are still open design questions. I've called those 
out in discussion blocks. Often, I've made a somewhat arbitrary choice for how 
to resolve the open question, based on my mood at the time. :-) While it's 
useful to get something down on paper, all of these will be more carefully 
explored and resolved in the coming months. If you see something *not* called 
out that you think still needs further discussion, let me know.

New term to look out for: "inlinable reference type", which is spec-speak for 
"Q type". (And its companion, "standard reference type", for "L type".) Why not 
call it a "primitive value type", like we do in the Java language? Because, 
unlike the language model, in the JVM it works best to treat all class types as 
reference types that participate in a single substitutability/subtyping graph, 
even though the JVM can optimize away the references in many cases. Our 
generics story leans heavily on the JVM handling types in this way. Given that 
mismatch, it seems too confusing to try to force the same terminology into the 
different models.

There are supplementary "cleanup" changes included in the bundle, if you're 
interested in exploring them. Most of these fall under the umbrella of the 
"Better-defined JVM class file validation" JEP I proposed a few weeks ago, but 
"JVM Types Cleanup" is new.

EG meeting canceled, 2021-06-30

2021-06-30 Thread Dan Smith

There's been a little bit of traffic on the list, but I've got a conflict today 
and I think we can defer talking about those things until next time. Next 
meeting July 14.

Re: Revisiting default values

2021-06-29 Thread Dan Smith

> On Jun 29, 2021, at 11:54 AM, Kevin Bourrillion  wrote:
> 
> Sorry for quietness of late.

Glad to have you back!

Unfortunately, there's not much new to report in this area, other than the fact 
that we are aware that more design and prototyping work is needed.

Here's an open task to prototype an initial javac-only strategy:
https://bugs.openjdk.java.net/browse/JDK-8252781

> Some new thoughts.
>   • Default behaviors of language features should be based first on 
> bug-proof-ness; if a user has to opt into safety that means they were not 
> safe.
>   • `null` and nullable types are a very good thing for safety; NPE 
> protects us from more nasty bugs than we can imagine.
>   • A world where all user-defined primitive classes must be nullable 
> (following Brian's outline) is completely sane, just not optimized.

These are good principles, and I'm sympathetic to them (with the implication 
that the right "default default" is null/*; using a different default 
should be opt-in).

(*By , I mean the all-zero-bits instance of a no-good-default class, 
without picking a particular semantics for that value.)

But... these principles potentially conflict with engineering constraints. 
E.g., I can imagine a world in which a no-good-default primitive class is no 
better than an identity class in most use cases, and at that point, we're best 
off simply not supporting the no-good-default feature at all. (With the 
implication that many of the primitive-candidate classes we are imagining 
should continue to be identity classes.) I don't like that world, and I don't 
know how realistic it is, but there's pressure coming from that direction.

To move forward on the "what is the best default default?" question, we're 
going to need more engineering on no-good-default classes, and get a better 
sense of their performance characteristics.

>   • (We'd like to still be able to fashion a non-nullable type when the 
> class itself allows nullability, but this is a general need we already have 
> for ref types and shouldn't have much bearing here. Still working hard on 
> jspecify.org...)

I think we're pretty locked in to:
- Some primitive class types like Complex must be non-nullable (for compactness)
- We won't (at least for now) support non-nullable types in full generality

Always possible that we'd want to step back and revisit this design, but it's 
pretty mature.

Speaking of orthogonality, there *is* an open question about how we interpret 
, and this is orthogonal to the question of whether  should 
be the "default default". We've talked about:
- It's interchangeable with null
- It's null-like (i.e., detected on member access), but distinct
- It's a separate concept, and it is an error to ever read it from fields/arrays

All still on the table.

(And within each of these, we still need to further explore the implications of 
JVM vs. language implementation strategies.)

>   • It's awkward that `Instant` would have to add a `boolean valid = 
> true` field, but it's not inappropriate. It has the misfortune that it both 
> can't restrict its range of values and has no logical zero/default.
>   • A type that does have a restricted range of legal values, but where 
> that range includes the `.default` value, might do some very ugly tricks to 
> avoid adding that boolean field; not sure what to think about this.

How we encode  is an interesting question that deserves more 
exploration. There's a potential trade-off here between safety and performance, 
and like you I'm inclined to prioritize safety. Maybe there are reasonable ways 
we can get them both...

>   • Among all the use cases for primitive classes, the ones where the 
> default value is non-degenerate and expected are the special cases! We use 
> `Complex` as a go-to example, but if most of what we did with complex numbers 
> was divide them by each other then even this would be dubious. We'd be 
> letting an invalid value masquerade as a valid one when we'd rather it just 
> manifest as `null` and be subject to NPEs.

Complex and friends are special cases, but they're also the *most important* 
cases. I'd really prefer not to have to pick, but if forced to, it may be more 
important for primitive classes to support optimally the 10% "has a good 
default" cases (roughly, those that are number-like) than the 90% "no good 
default" cases (roughly, those that wrap references).

>   • If we don't do something like Brian describes here, then I suppose 
> second-best is that we make a lot of these things ref-default (beginning with 
> Instant and not stopping there!) and warn about the dangers of `.val`

I'm not a big fan of this approach. It gives you the illusion of safety 
(well-written code only sees valid values) but blows up in unpredictable ways 
when a bug or a hostile actor leaks  into your program. If we don't 
offer stronger guarantees, and your code isn't willing to check for , 
you really shouldn't be programming with

Re: JEP 401 -- reflection and class literals

2021-06-23 Thread Dan Smith

> On Jun 23, 2021, at 12:39 PM, Brian Goetz  wrote:
> 
> 
>> We can "fix" this behavior by supporting "fuzzy matching" in the 'getMethod' 
>> method, so that both Point.val.class and Point.ref.class are considered 
>> matches for Point.val.class in method signatures. That feels to me like a 
>> bridge too far in our efforts to hide complexity from API users. YMMV. (Also 
>> doesn't work great if the language supports val/ref overloads; I think we 
>> lean towards *not* supporting these.)
> 
> We can surely try to prevent them, but I don't think it really does it much 
> good in this area.  We will surely not want the JVM to be trying to figure 
> out at class load time that:
> 
> class Foo { (LPoint;LString;I)V m }
> class Bar extends Foo { (QPoint;LString;I) m }
> 
> that Bar is an invalid class.  So given that classfiles will have these 
> potential conflicts, getMethod(Point.class, String.class, int.class) would 
> have to do the fuzzy thing anyway, and that's a mess.

Oh, sure, not suggesting a JVM validation check here. However, if the language 
rejects these overloads, then reflection could do something reasonable to just 
pick one when it encounters them (just like you can overload on return types in 
the JVM, but 'getMethod' doesn't let you explicitly disambiguate).

But anyway, if there are distinct Class objects for QPoint and LPoint, I don't 
love the idea of 'getMethod' letting the LPoint class object ever match a 
QPoint descriptor, regardless of any overloads present.

1 2 3 4 >

1 - 100 of 309 matches

Mail list logo