Re: Alternative to IdentityObject & ValueObject interfaces

2022-04-01 Thread Dan Smith
On Mar 22, 2022, at 10:52 PM, Dan Smith 
mailto:daniel.sm...@oracle.com>> wrote:

On Mar 22, 2022, at 7:21 PM, Dan Heidinga 
mailto:heidi...@redhat.com>> wrote:

A couple of comments on the encoding and questions related to descriptors.


JVM proposal:

- Same conceptual framework.

- Classes can be ACC_VALUE, ACC_IDENTITY, or neither.

- Legacy-version classes are implicitly ACC_IDENTITY. Legacy interfaces are 
not. Optionally, modern-version concrete classes are also implicitly 
ACC_IDENTITY.

Maybe this is too clever, but if we added ACC_VALUE and ACC_NEITHER
bits, then any class without one of the bits set (including all the
legacy classes) are identity classes.


(Trying out this alternative approach to abstract classes: there's no more 
ACC_PERMITS_VALUE; instead, legacy-version abstract classes are automatically 
ACC_IDENTITY, and modern-version abstract classes permit value subclasses 
unless they opt out with ACC_IDENTITY. It's the bytecode generator's 
responsibility to set these flags appropriately. Conceptually cleaner, maybe 
too risky...)

With the "clever" encoding, every class is implicitly identity unless
it sets ACC_VALUE or ACC_NEITHER and bytecode generators have to
explicitly flag modern abstract classes.  This is kind of growing on
me.

A problem is that interfaces are ACC_NEITHER by default, not ACC_IDENTITY. 
Abstract classes and interfaces have to get two different behaviors based on 
the same 0 bits.

Here's another more stable encoding, though, that feels less fiddly to me than 
what I originally wrote:

ACC_VALUE means "allows value object instances"

ACC_IDENTITY means "allows identity object instances"

If you set *both*, you're a "neither" class/interface. (That is, you allow both 
kinds of instances.)

If you set *none*, you get the default/legacy behavior implicitly: classes are 
ACC_IDENTITY only, interfaces are ACC_IDENTITY & ACC_VALUE.

Update on encoding: after some internal discussion, I've found this to be the 
most natural fit:

- ACC_VALUE (0x0040) corresponds to the 'value' keyword in source files
- ACC_IDENTITY (0x0020) corresponds to the (often implicit) 'identity' keyword 
in source files
- If neither is set, the class/interface supports both kinds of subclasses (and 
must be abstract)
- If both are set, or any supers' flags conflict, it's an error
- In older-version classes (not interfaces), ACC_IDENTITY is assumed to be set

What about newer-version classes that use old encodings? (E.g., a tool bumps 
its output version number but isn't aware of these flags.) There's a sneaky 
trick here that minimizes the risk: ACC_IDENTITY is re-using the old ACC_SUPER, 
which no longer has any effect and that we've encouraged to be set since Java 
1.0.2. So if you're already setting ACC_SUPER in your classes, you've 
automatically opted in to ACC_IDENTITY; doing something different requires 
making changes to the generated code.

So the remaining incompatibility risk is that someone generates a class (not an 
interface) with a newer version number and with neither flag set (violating the 
"always set ACC_SUPER" advice), and then either the class won't load (it's 
concrete, it declares an instance field, etc.), or it's abstract and 
accidentally supports value subclasses, and so can be instantiated without 
running  logic. The number of unlikely events in this scenario seem like 
enough for us not to be concerned.



Re: Alternative to IdentityObject & ValueObject interfaces

2022-03-24 Thread Remi Forax
> From: "Brian Goetz" 
> To: "daniel smith" , "valhalla-spec-experts"
> 
> Sent: Thursday, March 24, 2022 1:46:44 PM
> Subject: Re: Alternative to IdentityObject & ValueObject interfaces

> On 3/23/2022 10:51 PM, Dan Smith wrote:

>>> On Mar 22, 2022, at 5:56 PM, Dan Smith < [ mailto:daniel.sm...@oracle.com |
>>> daniel.sm...@oracle.com ] > wrote:

>>> - Variable types: I don't see a good way to get the equivalent of an
>>> 'IdentityObject' type. It would involve tracking the 'identity' property
>>> through the whole type system, which seems like a huge burden for the
>>> occasional "I'm not sure you can lock on that" error message. So we'd 
>>> probably
>>> need to be okay letting that go. Fortunately, I'm not sure it's a great
>>> loss—lots of code today seems happy using 'Object' when it means, 
>>> informally,
>>> "object that I've created for the sole purpose of locking".

>>> - Type variable bounds: this one seems more achievable, by using the 
>>> 'value' and
>>> 'identity' keywords to indicate a new kind of bounds check ('>> extends Runnable>'). Again, it's added complexity, but it's more localized. 
>>> We
>>> should think more about the use cases, and decide if it passes the 
>>> cost/benefit
>>> analysis. If not, nothing else depends on this, so it could be dropped. (Or
>>> left to a future, more general feature?)

>> Per today's discussion, this part seems to be the central question: how much
>> value can we expect to get out of compile-time checking?

> This is indeed the question. There's both a "theory" and a "practice" aspect,
> too.

>> The type system is going to have three kinds of types:
>> - types that guarantee identity objects
>> - types that guarantee value objects
>> - types that include both kinds of objects

>> That third kind are a problem: we can specify checks with false positives
>> (programmer knows the operation is safe, compiler complains anyway) or false
>> negatives (operation isn't safe, but the compiler lets it go).

> Flowing {Value,Identity}Object property is likely to require shoring up
> intersection types too, since we can express Runnable as a type
> bound, but not as a denotable type. Var helps a little here but ultimately 
> this
> is a hole through which information will drain.

> The arguments you make here are compelling to me, that while it might work in
> theory, in practice there are too many holes:

> - Legacy code that already deals in Object / interfaces and is not going to
> change
> - Even in new code, I suspect people will continue to do so, because as you 
> say,
> it is tedious for marginal value
> - The lack of intersection types will make it worse
> - Because of the above, many of the errors would be more like warnings, making
> it even weaker

> All of this sounds like a recipe for "new complexity that almost no one will
> actually use."

I agree, 
so if we drop the idea of having identity vs value info into the type system, 
the follow-up question is "should we restrict inheritance or not ?" 

Classes are tagged with value or not, and for an abstract class or an interface 
by default they allow both value types or identity types as subtypes. 
Do we need more, i.e. be able to restrict subtypes of an abstract 
class/interface to be value types (or identity types) only ? 

Yesterday, Dan S. talk about a user being able to restrict a hierarchy to be 
identity classes only. This will not help already existing codes but may help 
new codes by instead of having IdentityObject in the JDK, let a user define his 
own interface that play the same role as IdentityObject but tailored to his 
problem ? Or do we consider that even that use case does not worth it's own 
weight ? 

Rémi 


Re: Alternative to IdentityObject & ValueObject interfaces

2022-03-24 Thread Brian Goetz



On 3/23/2022 10:51 PM, Dan Smith wrote:

On Mar 22, 2022, at 5:56 PM, Dan Smith  wrote:

- Variable types: I don't see a good way to get the equivalent of an 
'IdentityObject' type. It would involve tracking the 'identity' 
property through the whole type system, which seems like a huge 
burden for the occasional "I'm not sure you can lock on that" error 
message. So we'd probably need to be okay letting that go. 
Fortunately, I'm not sure it's a great loss—lots of code today seems 
happy using 'Object' when it means, informally, "object that I've 
created for the sole purpose of locking".


- Type variable bounds: this one seems more achievable, by using the 
'value' and 'identity' keywords to indicate a new kind of bounds 
check (''). Again, it's added 
complexity, but it's more localized. We should think more about the 
use cases, and decide if it passes the cost/benefit analysis. If not, 
nothing else depends on this, so it could be dropped. (Or left to a 
future, more general feature?)


Per today's discussion, this part seems to be the central question: 
how much value can we expect to get out of compile-time checking?


This is indeed the question.  There's both a "theory" and a "practice" 
aspect, too.



The type system is going to have three kinds of types:
- types that guarantee identity objects
- types that guarantee value objects
- types that include both kinds of objects

That third kind are a problem: we can specify checks with false 
positives (programmer knows the operation is safe, compiler complains 
anyway) or false negatives (operation isn't safe, but the compiler 
lets it go).


Flowing {Value,Identity}Object property is likely to require shoring up 
intersection types too, since we can express Runnable as 
a type bound, but not as a denotable type.  Var helps a little here but 
ultimately this is a hole through which information will drain.


The arguments you make here are compelling to me, that while it might 
work in theory, in practice there are too many holes:


 - Legacy code that already deals in Object / interfaces and is not 
going to change
 - Even in new code, I suspect people will continue to do so, because 
as you say, it is tedious for marginal value

 - The lack of intersection types will make it worse
 - Because of the above, many of the errors would be more like 
warnings, making it even weaker


All of this sounds like a recipe for "new complexity that almost no one 
will actually use."




Re: Alternative to IdentityObject & ValueObject interfaces

2022-03-23 Thread Dan Smith
On Mar 22, 2022, at 5:56 PM, Dan Smith 
mailto:daniel.sm...@oracle.com>> wrote:

- Variable types: I don't see a good way to get the equivalent of an 
'IdentityObject' type. It would involve tracking the 'identity' property 
through the whole type system, which seems like a huge burden for the 
occasional "I'm not sure you can lock on that" error message. So we'd probably 
need to be okay letting that go. Fortunately, I'm not sure it's a great 
loss—lots of code today seems happy using 'Object' when it means, informally, 
"object that I've created for the sole purpose of locking".

- Type variable bounds: this one seems more achievable, by using the 'value' 
and 'identity' keywords to indicate a new kind of bounds check (''). Again, it's added complexity, but it's more localized. We 
should think more about the use cases, and decide if it passes the cost/benefit 
analysis. If not, nothing else depends on this, so it could be dropped. (Or 
left to a future, more general feature?)

Per today's discussion, this part seems to be the central question: how much 
value can we expect to get out of compile-time checking?

Stepping back from the type system details (that is, the below discussion 
applies whether we're using interfaces, modifiers on types, or something else), 
it's worth asking what errors we hope these features will help detect. We 
identified a couple of them today (and earlier in this thread):

- 'synchronized' on a value object
- storing a value object in a weak reference (in a world in which weak 
references don't support value objects)

Two questions:

1) What are the requirements for the analysis? How effective can it be?

The type system is going to have three kinds of types:
- types that guarantee identity objects
- types that guarantee value objects
- types that include both kinds of objects

That third kind are a problem: we can specify checks with false positives 
(programmer knows the operation is safe, compiler complains anyway) or false 
negatives (operation isn't safe, but the compiler lets it go).

For example, for the 'synchronized' operation, it's straightforward for the 
compiler to complain on a value class type. But what do we do with 
'synchronized' on some interface type? Say we go the false positive route; the 
check probably looks like a warning ("you might be synchronizing on a value 
object"). In this case:

- We've just created a bunch of warnings in existing code that people will 
probably just @SuppressWarnings rather than try to address through the types, 
because changing the types (throughout the flow of data) is a lot of work and 
comes with compatibility risks.

- Even in totally new code, if I'm not working with a specific identity class, 
I'm not sure I would bother fiddling with the types to get better checking. It 
seems really tedious. (For example, changing an interface-typed parameter 'Foo' 
to intersection type 'Foo & IdentityObject'.)

If we prefer to allow false negatives, then it's straightforward: value class 
types get errors, other types do not. There's no need for extra type system 
features. (E.g., 'IdentityObject' and 'Object' get treated exactly the same by 
'synchronized'.)

For weak references, it definitely doesn't make sense to reject types like 
WeakReference—that would be a compatibility mess. We could warn, but 
again, lots of false positive risk; and warnings don't generalize to 
general-purpose use of generics. I think again the best we could hope to do is 
to reject value class types. But something like 'T extends IdentityObject' 
doesn't accomplish this, because it excludes the "both kinds" types. Instead, 
we'd need something like 'T !extends ValueObject'.

2) Are these the best use cases we have? and are they really all that important?

These are the ones we've focused on, but maybe we can think of other 
applications. Other use cases would similarly have to involve the differences 
in runtime semantics.

Our two use cases share the property that they detect a runtime error (either 
an expression that we know will always throw, or with more aggressive checking 
an expression that *could* throw). That's helpful, but I do wonder how common 
such errors will be. We could do a bunch of type system work to detect division 
by zero, but nobody's asking for that because programmers just tend to avoid 
making that mistake already.

Synchronization: best practice is already to "own" the object being locked on, 
and that kind of knowledge isn't tracked by the type system. Doesn't seem that 
different for programmers to also be aware of whether their locking objects are 
identity objects without type system help.

Weak references: a WeakReference seems like an unlikely 
scenario—why are you trying to manage GC for a value object? (Assuming we've 
provided an alternative API to manage references *within* value objects, do 
cacheing, etc.) So most runtime errors will fall into the WeakReference 
or WeakReference category, and again there's a 

Re: Alternative to IdentityObject & ValueObject interfaces

2022-03-23 Thread Dan Heidinga
(Sorry Dan, you're receiving this twice.  I accidentally sent it off list first)

> Here's another more stable encoding, though, that feels less fiddly to me 
> than what I originally wrote:
>
> ACC_VALUE means "allows value object instances"
>
> ACC_IDENTITY means "allows identity object instances"
>
> If you set *both*, you're a "neither" class/interface. (That is, you allow 
> both kinds of instances.)
>
> If you set *none*, you get the default/legacy behavior implicitly: classes 
> are ACC_IDENTITY only, interfaces are ACC_IDENTITY & ACC_VALUE.
>

identity class F { }
abstract /* neither */ class N extends F { }
value class V extends N { }

This is clearly an illegal stacking of the bits as we can't have value
subclasses of identity objects.  We'll need to spec out rules about
super-class bits being "sticky" and overriding subclass bits if we
want to allow `N` to be loaded or add rules that make `N` illegal.

Apart from this - which isn't really different from dealing with
inheriting the VO/IO interfaces - I think your new proposed encoding
is better than my (not so) clever one.

--Dan



Re: Alternative to IdentityObject & ValueObject interfaces

2022-03-23 Thread Dan Heidinga
> As Dan points out, the main thing we give up by backing off from these 
> interfaces is the static typing; we don't get to use `IdentityObject` as a 
> parameter type, return type, or type bound.  And the only reason we've come 
> up with so far to want that is a pretty lame one -- locking.

During our various discussions, we've also used `IdentityObject` and
`ValueObject` as constraints in the t-vars / parametric VM proposal to
address methods that are only partially applicable.  We've also talked
about using that as a signal to allow locking and other
identity-operations to compile inside generic code that we can
statically know won't have to deal with values.

Does giving up on having VO/IO in the type system change our
approaches to either the parametric vm future or identity operations
in generic code?  It sounds like we're willing to give up on the
second but I don't have a good sense of what this does to the
parametric VM.

--Dan



Re: Alternative to IdentityObject & ValueObject interfaces

2022-03-23 Thread Remi Forax
- Original Message -
> From: "Maurizio Cimadamore" 
> To: "daniel smith" , "valhalla-spec-experts" 
> 
> Sent: Wednesday, March 23, 2022 11:23:26 AM
> Subject: Re: Alternative to IdentityObject & ValueObject interfaces

> On 22/03/2022 23:56, Dan Smith wrote:
>> Other abstract classes and interfaces are fine being neither (thus supporting
>> both kinds of subclasses).
> 
> I feel that for such a proposal to be really useful (but that's true for
> the interface-based approach as well IMHO), you need a way for the _use
> site_ to attach an identity vs. value annotation to types that can
> feature both polarities (Object, interfaces, value-compatible abstract
> classes).
> 
> It's perfectly fine to have identity vs. non-identity as a declaration
> property, for the cases whether that works. E.g. an ArrayList instance
> will always have identity. An instance of a `value class Point` will
> always be identity-less. Using modifiers vs. marker interfaces here is
> mostly an isomorphic move (and I agree that adding modifiers has less
> impact w.r.t. compatibility).
> 
> But it feels like both interfaces and decl-site modifiers fall short of
> having a consistent story for the "neither" case. I feel we'd like
> programmers to be able to say things like:
> 
> ```
> class Foo {
>    identity Object lock;
> 
>    void runAction(identity Runnable action) { ... }
> }
> ```
> 
> So, I believe the modifier idea has better potential than marker
> interfaces, because it scales at the use site in ways that marker
> interfaces can't (unless we allow intersection types in declarations).
> But of course I get that adding a new use-site modifier (like `final`)
> is also not to be taken lightly; aside from grammar conundrums, as you
> say it will have to be tracked by the type system.
> 
> Stepping back, you list 4 use cases:
> 
>> - Dynamic detection
>>
>> - Subclass restriction
>>
>> - Variable types
>>
>> - Type variable bounds
> IMHO, they are not equally important. And once you give up on "variable
> types" (as explained above, I believe this use case is not adequately
> covered by any proposal I've seen), then there's a question of how much
> incremental value the other use cases add. Dynamic detection can be
> added cheaply, fine. I also think that, especially in the context of
> universal generics, we do need a way to say: "this type variable is
> legacy/identity only" - but that can also be done quite cheaply. IMHO,
> restricting subclasses doesn't buy much, if you then don't have an
> adequate way to restrict type declarations at the use sites (for those
> things that cannot be restricted at the declaration), so I'd be also
> tempted to leave that use case alone as YAGNI (by teaching developers
> that synchronizing on Object and interface types is wrong, as we've been
> already trying to do).
> 
> P.S.
> 
> While writing this, a question came up: let's say I have a generic class
> like this:
> 
> ```
> class IdentityBox { ... }
> ```
> 
> Is IdentityBox a well-formed parameterized type? Based on your
> description I'm not sure: Runnable has the "neither" polarity, but T
> expects "identity". With marker interfaces this will just not work. With
> modifiers we could perhaps allow with unchecked warning?
> 
> I think it's important that the above type remains legal: I'd expect
> users to mark their type-variables as "identity" in cases where they
> just can't migrate the class implementation to support universal type
> variables. But if that decision results in a source incompatible change
> (w.r.t. existing parameterizations such as IdentityBox), then
> it doesn't look like a great story migration-wise.

Yes !

The neither types (Object, interfaces, abstract classes) act as an eraser of 
the identity|value bit if we do not support use site identity|value modifier, 
something like IdentityBox. And given that there is already 
existing codes in the wild that does not specify "identity" or "value" we need 
a kind of unsafe conversion/unsafe warning between the new world with use site 
type annotation and the old world with no type annotation.

As Brian said to Kevin, it's a problem very similar to the introduction of a 
null type annotation, it will be painful.

> 
> Maurizio

Rémi


Re: Alternative to IdentityObject & ValueObject interfaces

2022-03-23 Thread Remi Forax
Hi Brian, 
i've maybe have twisted mind but i read your email as a rebuttal of both 
IdentityObject/ValueObject and identity/value modifiers. 

As you said, an identity object and a value object are less dis-similar now 
that they were in the past: a value class now reuse the method equals and 
hashCode of j.l.Object instead of coming with it's own definition, 
a value class is now nullable.I agree with you that synchronized is not a real 
issue so as Dan H. said, the real remaining issue is weak refs. 

Now, if there is such a small differences between an identity object and a 
value object, do we really need to introduce a mechanism to separate them in 
term of typing ? 

Rémi 

> From: "Brian Goetz" 
> To: "daniel smith" , "valhalla-spec-experts"
> 
> Sent: Wednesday, March 23, 2022 1:01:20 PM
> Subject: Re: Alternative to IdentityObject & ValueObject interfaces

> Thanks Dan for putting the work in to provide a credible alternative.

> Let me add some background for how we came up with these things. At some point
> we asked ourselves, what if we had identity and value classes from day 1? How
> would that affect the object model? And we concluded at the time that we
> probably wouldn't want the identity-indeterminacy of Object, but instead would
> want something like

> abstract class Object
> class IdentityObject extends Object { }
> abstract class ValueObject extends Object { }

> So the {Identity,Value}Object interfaces seemed valuable pedagogically, in 
> that
> they make the object hierarchy reflect the language division. At the time, we
> imagined there might be methods that apply to all value objects, that could
> live in ValueObject.

> A separate factor is that we were taking operations that were previously total
> (locking, weak refs) and making them partial. This is scary! So we wanted a 
> way
> to make these expressible in the static type system.

> Unfortunately, the interfaces do not really deliver on either goal, because we
> can't turn back time. We still have to deal with `new Object()`, so we can't
> (yet) make Object abstract. Many signatures will not be changeable from
> "Object" to "IdentityObject" for reasons of compatibility, unless we make
> IdentityObject erase to Object (which has its own problems.) If people use it
> at all for type bounds, we'll see lots of uses of `Foo Bar>`, which will put more pressure on our weak support for
> intersection types. And dynamic errors will still happen, because too much of
> the world was built using signatures that don't express identity-ness. (Kevin
> will see a parallel to introducing nullness annotations; it might be fine if
> you build the world that way from scratch, but the transition is painful when
> you have to interpret an unadorned type as "of unspecified identity-ness.")

> Several years on, we're still leaning on the same few motivating examples --
> capturing things like "I might lock this" in the type system. That we haven't
> come up with more killer examples is notable. And I grow increasingly 
> skeptical
> of the value of the locking example, both because this is not how concurrent
> code is written, and because we *still* have to deal with the risk of dynamic
> errors because most of the world's code has not been (and will not be) written
> to use IdentityObject throughout.

> As Dan points out, the main thing we give up by backing off from these
> interfaces is the static typing; we don't get to use `IdentityObject` as a
> parameter type, return type, or type bound. And the only reason we've come up
> with so far to want that is a pretty lame one -- locking.

> From a language design perspective, I find that you declare a class with 
> `value
> class`, but you express the subclassing constraint with `extends
> IdentityObject`, to be pretty leaky.

> On 3/22/2022 7:56 PM, Dan Smith wrote:

>> In response to some encouragement from Remi, John, and others, I've decided 
>> to
>> take a closer look at how we might approach the categorization of value and
>> identity classes without relying on the IdentityObject and ValueObject
>> interfaces.

>> (For background, see the thread "The interfaces IdentityObject and 
>> ValueObject
>> must die" in January.)

>> These interfaces have found a number of different uses (enumerated below), 
>> while
>> mostly leaning on the existing functionality of interfaces, so there's a 
>> pretty
>> good complexity vs. benefit trade-off. But their use has some rough edges, 
>> and
>> inserting them everywhere has a nontrivial compatibility impact. Can we do
>> better?

>> Language proposal:

>> - A "value class" is any class whose instances a

Re: Alternative to IdentityObject & ValueObject interfaces

2022-03-23 Thread Brian Goetz

Thanks Dan for putting the work in to provide a credible alternative.

Let me add some background for how we came up with these things.  At 
some point we asked ourselves, what if we had identity and value classes 
from day 1?  How would that affect the object model?  And we concluded 
at the time that we probably wouldn't want the identity-indeterminacy of 
Object, but instead would want something like


    abstract class Object
    class IdentityObject extends Object { }
    abstract class ValueObject extends Object { }

So the {Identity,Value}Object interfaces seemed valuable pedagogically, 
in that they make the object hierarchy reflect the language division.  
At the time, we imagined there might be methods that apply to all value 
objects, that could live in ValueObject.


A separate factor is that we were taking operations that were previously 
total (locking, weak refs) and making them partial. This is scary!  So 
we wanted a way to make these expressible in the static type system.


Unfortunately, the interfaces do not really deliver on either goal, 
because we can't turn back time.  We still have to deal with `new 
Object()`, so we can't (yet) make Object abstract. Many signatures will 
not be changeable from "Object" to "IdentityObject" for reasons of 
compatibility, unless we make IdentityObject erase to Object (which has 
its own problems.)  If people use it at all for type bounds, we'll see 
lots of uses of `Foo`, which will put more 
pressure on our weak support for intersection types.  And dynamic errors 
will still happen, because too much of the world was built using 
signatures that don't express identity-ness. (Kevin will see a parallel 
to introducing nullness annotations; it might be fine if you build the 
world that way from scratch, but the transition is painful when you have 
to interpret an unadorned type as "of unspecified identity-ness.")


Several years on, we're still leaning on the same few motivating 
examples -- capturing things like "I might lock this" in the type 
system.  That we haven't come up with more killer examples is notable.  
And I grow increasingly skeptical of the value of the locking example, 
both because this is not how concurrent code is written, and because we 
*still* have to deal with the risk of dynamic errors because most of the 
world's code has not been (and will not be) written to use 
IdentityObject throughout.



As Dan points out, the main thing we give up by backing off from these 
interfaces is the static typing; we don't get to use `IdentityObject` as 
a parameter type, return type, or type bound.  And the only reason we've 
come up with so far to want that is a pretty lame one -- locking.


From a language design perspective, I find that you declare a class 
with `value class`, but you express the subclassing constraint with 
`extends IdentityObject`, to be pretty leaky.


On 3/22/2022 7:56 PM, Dan Smith wrote:

In response to some encouragement from Remi, John, and others, I've decided to 
take a closer look at how we might approach the categorization of value and 
identity classes without relying on the IdentityObject and ValueObject 
interfaces.

(For background, see the thread "The interfaces IdentityObject and ValueObject must 
die" in January.)

These interfaces have found a number of different uses (enumerated below), 
while mostly leaning on the existing functionality of interfaces, so there's a 
pretty good complexity vs. benefit trade-off. But their use has some rough 
edges, and inserting them everywhere has a nontrivial compatibility impact. Can 
we do better?

Language proposal:

- A "value class" is any class whose instances are all value objects. An "identity class" is any 
class whose instances are all identity objects. Abstract classes can be value classes or identity classes, or neither. 
Interfaces can be "value interfaces" or "identity interfaces", or neither.

- A class/interface can be designated a value class with the 'value' modifier.

value class Foo {}
abstract value class Bar {}
value interface Baz {}
value record Rec(int x) {}

A class/interface can be designated an identity class with the 'identity' 
modifier.

identity class Foo {}
abstract identity class Bar {}
identity interface Baz {}
identity record Rec(int x) {}

- Concrete classes with neither modifier are implicitly 'identity'; abstract 
classes with neither modifier, but with certain identity-dependent features 
(instance fields, initializers, synchronized methods, ...) are implicitly 
'identity' (possibly with a warning). Other abstract classes and interfaces are 
fine being neither (thus supporting both kinds of subclasses).

- The properties are inherited: if you extend a value class/interface, you are 
a value/class interface. (Same for identity classes/interfaces.) It's an error 
to be both.

- The usual restrictions apply to value classes, both concrete and abstract; and also to 
"neither" abstract classes, if they haven't been implicitly made 

Re: Alternative to IdentityObject & ValueObject interfaces

2022-03-23 Thread Maurizio Cimadamore



On 22/03/2022 23:56, Dan Smith wrote:

Other abstract classes and interfaces are fine being neither (thus supporting 
both kinds of subclasses).


I feel that for such a proposal to be really useful (but that's true for 
the interface-based approach as well IMHO), you need a way for the _use 
site_ to attach an identity vs. value annotation to types that can 
feature both polarities (Object, interfaces, value-compatible abstract 
classes).


It's perfectly fine to have identity vs. non-identity as a declaration 
property, for the cases whether that works. E.g. an ArrayList instance 
will always have identity. An instance of a `value class Point` will 
always be identity-less. Using modifiers vs. marker interfaces here is 
mostly an isomorphic move (and I agree that adding modifiers has less 
impact w.r.t. compatibility).


But it feels like both interfaces and decl-site modifiers fall short of 
having a consistent story for the "neither" case. I feel we'd like 
programmers to be able to say things like:


```
class Foo {
   identity Object lock;

   void runAction(identity Runnable action) { ... }
}
```

So, I believe the modifier idea has better potential than marker 
interfaces, because it scales at the use site in ways that marker 
interfaces can't (unless we allow intersection types in declarations). 
But of course I get that adding a new use-site modifier (like `final`) 
is also not to be taken lightly; aside from grammar conundrums, as you 
say it will have to be tracked by the type system.


Stepping back, you list 4 use cases:


- Dynamic detection

- Subclass restriction

- Variable types

- Type variable bounds
IMHO, they are not equally important. And once you give up on "variable 
types" (as explained above, I believe this use case is not adequately 
covered by any proposal I've seen), then there's a question of how much 
incremental value the other use cases add. Dynamic detection can be 
added cheaply, fine. I also think that, especially in the context of 
universal generics, we do need a way to say: "this type variable is 
legacy/identity only" - but that can also be done quite cheaply. IMHO, 
restricting subclasses doesn't buy much, if you then don't have an 
adequate way to restrict type declarations at the use sites (for those 
things that cannot be restricted at the declaration), so I'd be also 
tempted to leave that use case alone as YAGNI (by teaching developers 
that synchronizing on Object and interface types is wrong, as we've been 
already trying to do).


P.S.

While writing this, a question came up: let's say I have a generic class 
like this:


```
class IdentityBox { ... }
```

Is IdentityBox a well-formed parameterized type? Based on your 
description I'm not sure: Runnable has the "neither" polarity, but T 
expects "identity". With marker interfaces this will just not work. With 
modifiers we could perhaps allow with unchecked warning?


I think it's important that the above type remains legal: I'd expect 
users to mark their type-variables as "identity" in cases where they 
just can't migrate the class implementation to support universal type 
variables. But if that decision results in a source incompatible change 
(w.r.t. existing parameterizations such as IdentityBox), then 
it doesn't look like a great story migration-wise.


Maurizio



Re: Alternative to IdentityObject & ValueObject interfaces

2022-03-22 Thread Dan Smith
On Mar 22, 2022, at 7:44 PM, Kevin Bourrillion 
mailto:kev...@google.com>> wrote:

On Tue, Mar 22, 2022 at 4:56 PM Dan Smith 
mailto:daniel.sm...@oracle.com>> wrote:

In response to some encouragement from Remi, John, and others, I've decided to 
take a closer look at how we might approach the categorization of value and 
identity classes without relying on the IdentityObject and ValueObject 
interfaces.

(For background, see the thread "The interfaces IdentityObject and ValueObject 
must die" in January.)

Could anyone summarize the strongest version of the argument against them? The 
thread is not too easy to follow.

I'm sure there's more, but here's my sense of the notable problems with the 
status quo approach:

- We're adding a marker interface to every concrete class in the Java universe. 
Generally, an extra marker interface shouldn't affect anything, but the Java 
universe is big, and we're bound to break some things (specifically by changing 
reflection behavior and by producing more compile-time intersection types). We 
can ask people to fix their code and make fewer assumptions, but it adds 
upgrade friction, and the budget for breaking stuff is not unlimited.

- Injecting superinterfaces is something entirely new that I think JVMs would 
really rather not be involved with. But it's necessary for compatibly evolving 
class files. We've spent a surprising amount of time working out exactly when 
the interfaces should be injected; separate compilation leads to tricky corner 
cases.

- There's a tension between our use of modifiers and our use of interfaces. 
We've made some ad hoc choices about which are used in which places (e.g., you 
can't declare a concrete value class by saying 'class Foo implements 
ValueObject'). In the JVM, we need modifiers for format checking and interfaces 
for types. This is all okay, but the arbitrariness and redundancy of it is 
unsatisfying and suggests there might be some accidental complexity.

- Subclass restriction: 'implements IdentityObject' has been replaced with the 
'identity' modifier. Complexity cost of special modifiers seems on par with the 
complexity of special rules for inferring and checking the superinterfaces.

The rules for the modifiers are okay. But here's my observation. The simplest 
way to explain those rules would be if the `value` keyword is literally 
shorthand for `extends/implements ValueObject`. I think the rules fall out from 
that, plus:

  *   IO and VO are disjoint. (As interfaces can already be, like `interface 
Foo { int x(); }` and `interface Bar { boolean x(); }`, and if it really came 
down to it, you could literally put an incompatible method into each type and 
blame their noncohabitation on that :-))
  *   A class that breaks the value class rules has committed to being an 
identity class.
  *   We wouldn't know how to make an instance that is "neither", so 
instantiating a "neither" class has to have default behavior, and that has to 
be to give you what it always has.

In each case I've explained why the rule seems very easy to understand to me. 
So from my POV, this still pulls me back to the types anyway. I would say that 
your rules for the modifiers are largely simulating those types.

Yes, it is nice how we get inheritance for free from interfaces. But when you 
compare that with the "plus" list (which I'd summarize as: disjointedness, 
declaration restrictions, and inference), it's not like getting inheritance 
"for free" is such a huge win. It's maybe 20% less complexity or something to 
explain the feature.

Of course the big win is that interfaces are types, so we already know how to 
use them in the static type system. As your later comments suggest, I think our 
expectations for static typing are probably the most important factor in 
deciding which strategy best meets our needs.


Re: Alternative to IdentityObject & ValueObject interfaces

2022-03-22 Thread Dan Smith
On Mar 22, 2022, at 7:21 PM, Dan Heidinga 
mailto:heidi...@redhat.com>> wrote:

A couple of comments on the encoding and questions related to descriptors.


JVM proposal:

- Same conceptual framework.

- Classes can be ACC_VALUE, ACC_IDENTITY, or neither.

- Legacy-version classes are implicitly ACC_IDENTITY. Legacy interfaces are 
not. Optionally, modern-version concrete classes are also implicitly 
ACC_IDENTITY.

Maybe this is too clever, but if we added ACC_VALUE and ACC_NEITHER
bits, then any class without one of the bits set (including all the
legacy classes) are identity classes.


(Trying out this alternative approach to abstract classes: there's no more 
ACC_PERMITS_VALUE; instead, legacy-version abstract classes are automatically 
ACC_IDENTITY, and modern-version abstract classes permit value subclasses 
unless they opt out with ACC_IDENTITY. It's the bytecode generator's 
responsibility to set these flags appropriately. Conceptually cleaner, maybe 
too risky...)

With the "clever" encoding, every class is implicitly identity unless
it sets ACC_VALUE or ACC_NEITHER and bytecode generators have to
explicitly flag modern abstract classes.  This is kind of growing on
me.

A problem is that interfaces are ACC_NEITHER by default, not ACC_IDENTITY. 
Abstract classes and interfaces have to get two different behaviors based on 
the same 0 bits.

Here's another more stable encoding, though, that feels less fiddly to me than 
what I originally wrote:

ACC_VALUE means "allows value object instances"

ACC_IDENTITY means "allows identity object instances"

If you set *both*, you're a "neither" class/interface. (That is, you allow both 
kinds of instances.)

If you set *none*, you get the default/legacy behavior implicitly: classes are 
ACC_IDENTITY only, interfaces are ACC_IDENTITY & ACC_VALUE.



Re: Alternative to IdentityObject & ValueObject interfaces

2022-03-22 Thread Kevin Bourrillion
On Tue, Mar 22, 2022 at 4:56 PM Dan Smith  wrote:

In response to some encouragement from Remi, John, and others, I've decided
> to take a closer look at how we might approach the categorization of value
> and identity classes without relying on the IdentityObject and ValueObject
> interfaces.
>
> (For background, see the thread "The interfaces IdentityObject and
> ValueObject must die" in January.)
>

Could anyone summarize the strongest version of the argument against them?
The thread is not too easy to follow.


- A "value class" is any class whose instances are all value objects. An
> "identity class" is any class whose instances are all identity objects.


I assume you are contrasting "bucket 1" vs. "buckets 2+3" here. (My own
chosen nomenclature would only alter it slightly to say that value classes
*also* have instances that are pure values, no object in sight.)


- Subclass restriction: 'implements IdentityObject' has been replaced with
> the 'identity' modifier. Complexity cost of special modifiers seems on par
> with the complexity of special rules for inferring and checking the
> superinterfaces.


The rules for the modifiers are okay. But here's my observation. The
simplest way to explain those rules would be if the `value` keyword is
literally shorthand for `extends/implements ValueObject`. I think the rules
fall out from that, plus:

   - IO and VO are disjoint. (As interfaces can *already* be, like
   `interface Foo { int x(); }` and `interface Bar { boolean x(); }`, and if
   it really came down to it, you could literally put an incompatible method
   into each type and blame their noncohabitation on that :-))
   - A class that breaks the value class rules has committed to being an
   identity class.
   - We wouldn't know how to make an *instance* that is "neither", so
   *instantiating* a "neither" class has to have default behavior, and that
   has to be to give you what it always has.

In each case I've explained why the rule seems very easy to understand to
me. So from my POV, this *still* pulls me back to the types anyway. I would
say that your rules for the modifiers are largely *simulating* those types.


I think it's a win that we use the 'value' modifier and "value" terminology
> for all kinds of classes/interfaces, not just concrete classes.
>

I think I've probably come around to that terminology over the long course
of reediting this email.


- Variable types: I don't see a good way to get the equivalent of an
> 'IdentityObject' type. It would involve tracking the 'identity' property
> through the whole type system, which seems like a huge burden for the
> occasional "I'm not sure you can lock on that" error message. So we'd
> probably need to be okay letting that go. Fortunately, I'm not sure it's a
> great loss—lots of code today seems happy using 'Object' when it means,
> informally, "object that I've created for the sole purpose of locking".
>

I'm confused, because it seems like we'd be throwing out an awful lot here.
If I pass a value object to `identityHashCode` we'd rather that didn't
compile. Seems like this list goes on a long way.


- Type variable bounds: this one seems more achievable, by using the
> 'value' and 'identity' keywords to indicate a new kind of bounds check
> (''). Again, it's added complexity, but it's
> more localized. We should think more about the use cases, and decide if it
> passes the cost/benefit analysis. If not, nothing else depends on this, so
> it could be dropped. (Or left to a future, more general feature?)
>

Don't we already need `Foo` though? Adding this too seems
*super* confusing
to me. Let types do what types already do.



> - Documentation: we've lost the handy javadoc location to put some
> explanations about identity & value objects in a place that curious
> programmers can easily stumble on. Anything we want to say needs to go in
> JLS/JVMS (or perhaps the java.lang.Object javadoc).
>

Going beyond "mere" documentation: capturing capabilities and constraints
is precisely what types are *for*. Isn't being able to determine those
behaviors from types the reason people choose a strongly typed language in
the first place?


- Compatibility: pretty clear win here. No interface injection means tools
> that depend on reflection results won't be broken. (We've found a
> significant number of these problems in our own code/tests, FWIW.) No new
> static types means inference results won't change.
>

Seems less "breaking compatibility" than just "Hyrum's Law".  But I lack
understanding of how widespread or hard to fix these problems are. We could
maybe do an experiment over here if necessary.
-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kev...@google.com


Re: Alternative to IdentityObject & ValueObject interfaces

2022-03-22 Thread Dan Heidinga
A couple of comments on the encoding and questions related to descriptors.

>
> JVM proposal:
>
> - Same conceptual framework.
>
> - Classes can be ACC_VALUE, ACC_IDENTITY, or neither.
>
> - Legacy-version classes are implicitly ACC_IDENTITY. Legacy interfaces are 
> not. Optionally, modern-version concrete classes are also implicitly 
> ACC_IDENTITY.

Maybe this is too clever, but if we added ACC_VALUE and ACC_NEITHER
bits, then any class without one of the bits set (including all the
legacy classes) are identity classes.

>
> (Trying out this alternative approach to abstract classes: there's no more 
> ACC_PERMITS_VALUE; instead, legacy-version abstract classes are automatically 
> ACC_IDENTITY, and modern-version abstract classes permit value subclasses 
> unless they opt out with ACC_IDENTITY. It's the bytecode generator's 
> responsibility to set these flags appropriately. Conceptually cleaner, maybe 
> too risky...)

With the "clever" encoding, every class is implicitly identity unless
it sets ACC_VALUE or ACC_NEITHER and bytecode generators have to
explicitly flag modern abstract classes.  This is kind of growing on
me.

>
> - At class load time, we inherit value/identity-ness and check for conflicts. 
> It's okay to have neither flag set but inherit the property from one of your 
> supers. We also enforce constraints on value classes and "neither" abstract 
> classes.
>
> ---
>
> So how does this score as a replacement for the list of features enabled by 
> the interfaces?
>
> - Dynamic detection: 'obj instanceof ValueObject' is quite straightforward; 
> if we can replace that with 'obj.isValueObject()', that feels about equally 
> useful. (I'd be more pessimistic about something like 
> 'Objects.isValueObject(obj)'.)
>
> - Subclass restriction: 'implements IdentityObject' has been replaced with 
> the 'identity' modifier. Complexity cost of special modifiers seems on par 
> with the complexity of special rules for inferring and checking the 
> superinterfaces. I think it's a win that we use the 'value' modifier and 
> "value" terminology for all kinds of classes/interfaces, not just concrete 
> classes.
>
> - Variable types: I don't see a good way to get the equivalent of an 
> 'IdentityObject' type. It would involve tracking the 'identity' property 
> through the whole type system, which seems like a huge burden for the 
> occasional "I'm not sure you can lock on that" error message. So we'd 
> probably need to be okay letting that go. Fortunately, I'm not sure it's a 
> great loss—lots of code today seems happy using 'Object' when it means, 
> informally, "object that I've created for the sole purpose of locking".

Do method parameters also fall under this case?  To pick on our
favourite example, how would we use the type system to adapt the
descriptor for WeakReference:: so it wouldn't accept Values?
Are we willing to give up on putting the constraints in the type
system given that interfaces are checked on use (ie: invokeinterface)
rather than by the verifier?

>
> - Type variable bounds: this one seems more achievable, by using the 'value' 
> and 'identity' keywords to indicate a new kind of bounds check (' extends Runnable>'). Again, it's added complexity, but it's more localized. 
> We should think more about the use cases, and decide if it passes the 
> cost/benefit analysis. If not, nothing else depends on this, so it could be 
> dropped. (Or left to a future, more general feature?)

Similar to the question above, how would this be reflected in the
method descriptor?  `` would erase to
Runnable?  If it doesn't get expressed somewhere that the VM can take
action on it, I think it will give developers a false sense of
security regarding the method's invariants and encourage them to skip
the `!obj.isValueObject()` checks in these methods.

>
> - Documentation: we've lost the handy javadoc location to put some 
> explanations about identity & value objects in a place that curious 
> programmers can easily stumble on. Anything we want to say needs to go in 
> JLS/JVMS (or perhaps the java.lang.Object javadoc).
>
> - Compatibility: pretty clear win here. No interface injection means tools 
> that depend on reflection results won't be broken. (We've found a significant 
> number of these problems in our own code/tests, FWIW.) No new static types 
> means inference results won't change. There's less risk of incompatibilities 
> when adding/removing the 'identity' and 'value' keywords (although there can 
> still be source, binary, and behavioral incompatibilities).
>

--Dan