Re: Alternative to IdentityObject & ValueObject interfaces
On Mar 22, 2022, at 10:52 PM, Dan Smith mailto:daniel.sm...@oracle.com>> wrote: On Mar 22, 2022, at 7:21 PM, Dan Heidinga mailto:heidi...@redhat.com>> wrote: A couple of comments on the encoding and questions related to descriptors. JVM proposal: - Same conceptual framework. - Classes can be ACC_VALUE, ACC_IDENTITY, or neither. - Legacy-version classes are implicitly ACC_IDENTITY. Legacy interfaces are not. Optionally, modern-version concrete classes are also implicitly ACC_IDENTITY. Maybe this is too clever, but if we added ACC_VALUE and ACC_NEITHER bits, then any class without one of the bits set (including all the legacy classes) are identity classes. (Trying out this alternative approach to abstract classes: there's no more ACC_PERMITS_VALUE; instead, legacy-version abstract classes are automatically ACC_IDENTITY, and modern-version abstract classes permit value subclasses unless they opt out with ACC_IDENTITY. It's the bytecode generator's responsibility to set these flags appropriately. Conceptually cleaner, maybe too risky...) With the "clever" encoding, every class is implicitly identity unless it sets ACC_VALUE or ACC_NEITHER and bytecode generators have to explicitly flag modern abstract classes. This is kind of growing on me. A problem is that interfaces are ACC_NEITHER by default, not ACC_IDENTITY. Abstract classes and interfaces have to get two different behaviors based on the same 0 bits. Here's another more stable encoding, though, that feels less fiddly to me than what I originally wrote: ACC_VALUE means "allows value object instances" ACC_IDENTITY means "allows identity object instances" If you set *both*, you're a "neither" class/interface. (That is, you allow both kinds of instances.) If you set *none*, you get the default/legacy behavior implicitly: classes are ACC_IDENTITY only, interfaces are ACC_IDENTITY & ACC_VALUE. Update on encoding: after some internal discussion, I've found this to be the most natural fit: - ACC_VALUE (0x0040) corresponds to the 'value' keyword in source files - ACC_IDENTITY (0x0020) corresponds to the (often implicit) 'identity' keyword in source files - If neither is set, the class/interface supports both kinds of subclasses (and must be abstract) - If both are set, or any supers' flags conflict, it's an error - In older-version classes (not interfaces), ACC_IDENTITY is assumed to be set What about newer-version classes that use old encodings? (E.g., a tool bumps its output version number but isn't aware of these flags.) There's a sneaky trick here that minimizes the risk: ACC_IDENTITY is re-using the old ACC_SUPER, which no longer has any effect and that we've encouraged to be set since Java 1.0.2. So if you're already setting ACC_SUPER in your classes, you've automatically opted in to ACC_IDENTITY; doing something different requires making changes to the generated code. So the remaining incompatibility risk is that someone generates a class (not an interface) with a newer version number and with neither flag set (violating the "always set ACC_SUPER" advice), and then either the class won't load (it's concrete, it declares an instance field, etc.), or it's abstract and accidentally supports value subclasses, and so can be instantiated without running logic. The number of unlikely events in this scenario seem like enough for us not to be concerned.
Re: Alternative to IdentityObject & ValueObject interfaces
> From: "Brian Goetz" > To: "daniel smith" , "valhalla-spec-experts" > > Sent: Thursday, March 24, 2022 1:46:44 PM > Subject: Re: Alternative to IdentityObject & ValueObject interfaces > On 3/23/2022 10:51 PM, Dan Smith wrote: >>> On Mar 22, 2022, at 5:56 PM, Dan Smith < [ mailto:daniel.sm...@oracle.com | >>> daniel.sm...@oracle.com ] > wrote: >>> - Variable types: I don't see a good way to get the equivalent of an >>> 'IdentityObject' type. It would involve tracking the 'identity' property >>> through the whole type system, which seems like a huge burden for the >>> occasional "I'm not sure you can lock on that" error message. So we'd >>> probably >>> need to be okay letting that go. Fortunately, I'm not sure it's a great >>> loss—lots of code today seems happy using 'Object' when it means, >>> informally, >>> "object that I've created for the sole purpose of locking". >>> - Type variable bounds: this one seems more achievable, by using the >>> 'value' and >>> 'identity' keywords to indicate a new kind of bounds check ('>> extends Runnable>'). Again, it's added complexity, but it's more localized. >>> We >>> should think more about the use cases, and decide if it passes the >>> cost/benefit >>> analysis. If not, nothing else depends on this, so it could be dropped. (Or >>> left to a future, more general feature?) >> Per today's discussion, this part seems to be the central question: how much >> value can we expect to get out of compile-time checking? > This is indeed the question. There's both a "theory" and a "practice" aspect, > too. >> The type system is going to have three kinds of types: >> - types that guarantee identity objects >> - types that guarantee value objects >> - types that include both kinds of objects >> That third kind are a problem: we can specify checks with false positives >> (programmer knows the operation is safe, compiler complains anyway) or false >> negatives (operation isn't safe, but the compiler lets it go). > Flowing {Value,Identity}Object property is likely to require shoring up > intersection types too, since we can express Runnable as a type > bound, but not as a denotable type. Var helps a little here but ultimately > this > is a hole through which information will drain. > The arguments you make here are compelling to me, that while it might work in > theory, in practice there are too many holes: > - Legacy code that already deals in Object / interfaces and is not going to > change > - Even in new code, I suspect people will continue to do so, because as you > say, > it is tedious for marginal value > - The lack of intersection types will make it worse > - Because of the above, many of the errors would be more like warnings, making > it even weaker > All of this sounds like a recipe for "new complexity that almost no one will > actually use." I agree, so if we drop the idea of having identity vs value info into the type system, the follow-up question is "should we restrict inheritance or not ?" Classes are tagged with value or not, and for an abstract class or an interface by default they allow both value types or identity types as subtypes. Do we need more, i.e. be able to restrict subtypes of an abstract class/interface to be value types (or identity types) only ? Yesterday, Dan S. talk about a user being able to restrict a hierarchy to be identity classes only. This will not help already existing codes but may help new codes by instead of having IdentityObject in the JDK, let a user define his own interface that play the same role as IdentityObject but tailored to his problem ? Or do we consider that even that use case does not worth it's own weight ? Rémi
Re: Alternative to IdentityObject & ValueObject interfaces
On 3/23/2022 10:51 PM, Dan Smith wrote: On Mar 22, 2022, at 5:56 PM, Dan Smith wrote: - Variable types: I don't see a good way to get the equivalent of an 'IdentityObject' type. It would involve tracking the 'identity' property through the whole type system, which seems like a huge burden for the occasional "I'm not sure you can lock on that" error message. So we'd probably need to be okay letting that go. Fortunately, I'm not sure it's a great loss—lots of code today seems happy using 'Object' when it means, informally, "object that I've created for the sole purpose of locking". - Type variable bounds: this one seems more achievable, by using the 'value' and 'identity' keywords to indicate a new kind of bounds check (''). Again, it's added complexity, but it's more localized. We should think more about the use cases, and decide if it passes the cost/benefit analysis. If not, nothing else depends on this, so it could be dropped. (Or left to a future, more general feature?) Per today's discussion, this part seems to be the central question: how much value can we expect to get out of compile-time checking? This is indeed the question. There's both a "theory" and a "practice" aspect, too. The type system is going to have three kinds of types: - types that guarantee identity objects - types that guarantee value objects - types that include both kinds of objects That third kind are a problem: we can specify checks with false positives (programmer knows the operation is safe, compiler complains anyway) or false negatives (operation isn't safe, but the compiler lets it go). Flowing {Value,Identity}Object property is likely to require shoring up intersection types too, since we can express Runnable as a type bound, but not as a denotable type. Var helps a little here but ultimately this is a hole through which information will drain. The arguments you make here are compelling to me, that while it might work in theory, in practice there are too many holes: - Legacy code that already deals in Object / interfaces and is not going to change - Even in new code, I suspect people will continue to do so, because as you say, it is tedious for marginal value - The lack of intersection types will make it worse - Because of the above, many of the errors would be more like warnings, making it even weaker All of this sounds like a recipe for "new complexity that almost no one will actually use."
Re: Alternative to IdentityObject & ValueObject interfaces
On Mar 22, 2022, at 5:56 PM, Dan Smith mailto:daniel.sm...@oracle.com>> wrote: - Variable types: I don't see a good way to get the equivalent of an 'IdentityObject' type. It would involve tracking the 'identity' property through the whole type system, which seems like a huge burden for the occasional "I'm not sure you can lock on that" error message. So we'd probably need to be okay letting that go. Fortunately, I'm not sure it's a great loss—lots of code today seems happy using 'Object' when it means, informally, "object that I've created for the sole purpose of locking". - Type variable bounds: this one seems more achievable, by using the 'value' and 'identity' keywords to indicate a new kind of bounds check (''). Again, it's added complexity, but it's more localized. We should think more about the use cases, and decide if it passes the cost/benefit analysis. If not, nothing else depends on this, so it could be dropped. (Or left to a future, more general feature?) Per today's discussion, this part seems to be the central question: how much value can we expect to get out of compile-time checking? Stepping back from the type system details (that is, the below discussion applies whether we're using interfaces, modifiers on types, or something else), it's worth asking what errors we hope these features will help detect. We identified a couple of them today (and earlier in this thread): - 'synchronized' on a value object - storing a value object in a weak reference (in a world in which weak references don't support value objects) Two questions: 1) What are the requirements for the analysis? How effective can it be? The type system is going to have three kinds of types: - types that guarantee identity objects - types that guarantee value objects - types that include both kinds of objects That third kind are a problem: we can specify checks with false positives (programmer knows the operation is safe, compiler complains anyway) or false negatives (operation isn't safe, but the compiler lets it go). For example, for the 'synchronized' operation, it's straightforward for the compiler to complain on a value class type. But what do we do with 'synchronized' on some interface type? Say we go the false positive route; the check probably looks like a warning ("you might be synchronizing on a value object"). In this case: - We've just created a bunch of warnings in existing code that people will probably just @SuppressWarnings rather than try to address through the types, because changing the types (throughout the flow of data) is a lot of work and comes with compatibility risks. - Even in totally new code, if I'm not working with a specific identity class, I'm not sure I would bother fiddling with the types to get better checking. It seems really tedious. (For example, changing an interface-typed parameter 'Foo' to intersection type 'Foo & IdentityObject'.) If we prefer to allow false negatives, then it's straightforward: value class types get errors, other types do not. There's no need for extra type system features. (E.g., 'IdentityObject' and 'Object' get treated exactly the same by 'synchronized'.) For weak references, it definitely doesn't make sense to reject types like WeakReference—that would be a compatibility mess. We could warn, but again, lots of false positive risk; and warnings don't generalize to general-purpose use of generics. I think again the best we could hope to do is to reject value class types. But something like 'T extends IdentityObject' doesn't accomplish this, because it excludes the "both kinds" types. Instead, we'd need something like 'T !extends ValueObject'. 2) Are these the best use cases we have? and are they really all that important? These are the ones we've focused on, but maybe we can think of other applications. Other use cases would similarly have to involve the differences in runtime semantics. Our two use cases share the property that they detect a runtime error (either an expression that we know will always throw, or with more aggressive checking an expression that *could* throw). That's helpful, but I do wonder how common such errors will be. We could do a bunch of type system work to detect division by zero, but nobody's asking for that because programmers just tend to avoid making that mistake already. Synchronization: best practice is already to "own" the object being locked on, and that kind of knowledge isn't tracked by the type system. Doesn't seem that different for programmers to also be aware of whether their locking objects are identity objects without type system help. Weak references: a WeakReference seems like an unlikely scenario—why are you trying to manage GC for a value object? (Assuming we've provided an alternative API to manage references *within* value objects, do cacheing, etc.) So most runtime errors will fall into the WeakReference or WeakReference category, and again there's a
Re: Alternative to IdentityObject & ValueObject interfaces
(Sorry Dan, you're receiving this twice. I accidentally sent it off list first) > Here's another more stable encoding, though, that feels less fiddly to me > than what I originally wrote: > > ACC_VALUE means "allows value object instances" > > ACC_IDENTITY means "allows identity object instances" > > If you set *both*, you're a "neither" class/interface. (That is, you allow > both kinds of instances.) > > If you set *none*, you get the default/legacy behavior implicitly: classes > are ACC_IDENTITY only, interfaces are ACC_IDENTITY & ACC_VALUE. > identity class F { } abstract /* neither */ class N extends F { } value class V extends N { } This is clearly an illegal stacking of the bits as we can't have value subclasses of identity objects. We'll need to spec out rules about super-class bits being "sticky" and overriding subclass bits if we want to allow `N` to be loaded or add rules that make `N` illegal. Apart from this - which isn't really different from dealing with inheriting the VO/IO interfaces - I think your new proposed encoding is better than my (not so) clever one. --Dan
Re: Alternative to IdentityObject & ValueObject interfaces
> As Dan points out, the main thing we give up by backing off from these > interfaces is the static typing; we don't get to use `IdentityObject` as a > parameter type, return type, or type bound. And the only reason we've come > up with so far to want that is a pretty lame one -- locking. During our various discussions, we've also used `IdentityObject` and `ValueObject` as constraints in the t-vars / parametric VM proposal to address methods that are only partially applicable. We've also talked about using that as a signal to allow locking and other identity-operations to compile inside generic code that we can statically know won't have to deal with values. Does giving up on having VO/IO in the type system change our approaches to either the parametric vm future or identity operations in generic code? It sounds like we're willing to give up on the second but I don't have a good sense of what this does to the parametric VM. --Dan
Re: Alternative to IdentityObject & ValueObject interfaces
- Original Message - > From: "Maurizio Cimadamore" > To: "daniel smith" , "valhalla-spec-experts" > > Sent: Wednesday, March 23, 2022 11:23:26 AM > Subject: Re: Alternative to IdentityObject & ValueObject interfaces > On 22/03/2022 23:56, Dan Smith wrote: >> Other abstract classes and interfaces are fine being neither (thus supporting >> both kinds of subclasses). > > I feel that for such a proposal to be really useful (but that's true for > the interface-based approach as well IMHO), you need a way for the _use > site_ to attach an identity vs. value annotation to types that can > feature both polarities (Object, interfaces, value-compatible abstract > classes). > > It's perfectly fine to have identity vs. non-identity as a declaration > property, for the cases whether that works. E.g. an ArrayList instance > will always have identity. An instance of a `value class Point` will > always be identity-less. Using modifiers vs. marker interfaces here is > mostly an isomorphic move (and I agree that adding modifiers has less > impact w.r.t. compatibility). > > But it feels like both interfaces and decl-site modifiers fall short of > having a consistent story for the "neither" case. I feel we'd like > programmers to be able to say things like: > > ``` > class Foo { > identity Object lock; > > void runAction(identity Runnable action) { ... } > } > ``` > > So, I believe the modifier idea has better potential than marker > interfaces, because it scales at the use site in ways that marker > interfaces can't (unless we allow intersection types in declarations). > But of course I get that adding a new use-site modifier (like `final`) > is also not to be taken lightly; aside from grammar conundrums, as you > say it will have to be tracked by the type system. > > Stepping back, you list 4 use cases: > >> - Dynamic detection >> >> - Subclass restriction >> >> - Variable types >> >> - Type variable bounds > IMHO, they are not equally important. And once you give up on "variable > types" (as explained above, I believe this use case is not adequately > covered by any proposal I've seen), then there's a question of how much > incremental value the other use cases add. Dynamic detection can be > added cheaply, fine. I also think that, especially in the context of > universal generics, we do need a way to say: "this type variable is > legacy/identity only" - but that can also be done quite cheaply. IMHO, > restricting subclasses doesn't buy much, if you then don't have an > adequate way to restrict type declarations at the use sites (for those > things that cannot be restricted at the declaration), so I'd be also > tempted to leave that use case alone as YAGNI (by teaching developers > that synchronizing on Object and interface types is wrong, as we've been > already trying to do). > > P.S. > > While writing this, a question came up: let's say I have a generic class > like this: > > ``` > class IdentityBox { ... } > ``` > > Is IdentityBox a well-formed parameterized type? Based on your > description I'm not sure: Runnable has the "neither" polarity, but T > expects "identity". With marker interfaces this will just not work. With > modifiers we could perhaps allow with unchecked warning? > > I think it's important that the above type remains legal: I'd expect > users to mark their type-variables as "identity" in cases where they > just can't migrate the class implementation to support universal type > variables. But if that decision results in a source incompatible change > (w.r.t. existing parameterizations such as IdentityBox), then > it doesn't look like a great story migration-wise. Yes ! The neither types (Object, interfaces, abstract classes) act as an eraser of the identity|value bit if we do not support use site identity|value modifier, something like IdentityBox. And given that there is already existing codes in the wild that does not specify "identity" or "value" we need a kind of unsafe conversion/unsafe warning between the new world with use site type annotation and the old world with no type annotation. As Brian said to Kevin, it's a problem very similar to the introduction of a null type annotation, it will be painful. > > Maurizio Rémi
Re: Alternative to IdentityObject & ValueObject interfaces
Hi Brian, i've maybe have twisted mind but i read your email as a rebuttal of both IdentityObject/ValueObject and identity/value modifiers. As you said, an identity object and a value object are less dis-similar now that they were in the past: a value class now reuse the method equals and hashCode of j.l.Object instead of coming with it's own definition, a value class is now nullable.I agree with you that synchronized is not a real issue so as Dan H. said, the real remaining issue is weak refs. Now, if there is such a small differences between an identity object and a value object, do we really need to introduce a mechanism to separate them in term of typing ? Rémi > From: "Brian Goetz" > To: "daniel smith" , "valhalla-spec-experts" > > Sent: Wednesday, March 23, 2022 1:01:20 PM > Subject: Re: Alternative to IdentityObject & ValueObject interfaces > Thanks Dan for putting the work in to provide a credible alternative. > Let me add some background for how we came up with these things. At some point > we asked ourselves, what if we had identity and value classes from day 1? How > would that affect the object model? And we concluded at the time that we > probably wouldn't want the identity-indeterminacy of Object, but instead would > want something like > abstract class Object > class IdentityObject extends Object { } > abstract class ValueObject extends Object { } > So the {Identity,Value}Object interfaces seemed valuable pedagogically, in > that > they make the object hierarchy reflect the language division. At the time, we > imagined there might be methods that apply to all value objects, that could > live in ValueObject. > A separate factor is that we were taking operations that were previously total > (locking, weak refs) and making them partial. This is scary! So we wanted a > way > to make these expressible in the static type system. > Unfortunately, the interfaces do not really deliver on either goal, because we > can't turn back time. We still have to deal with `new Object()`, so we can't > (yet) make Object abstract. Many signatures will not be changeable from > "Object" to "IdentityObject" for reasons of compatibility, unless we make > IdentityObject erase to Object (which has its own problems.) If people use it > at all for type bounds, we'll see lots of uses of `Foo Bar>`, which will put more pressure on our weak support for > intersection types. And dynamic errors will still happen, because too much of > the world was built using signatures that don't express identity-ness. (Kevin > will see a parallel to introducing nullness annotations; it might be fine if > you build the world that way from scratch, but the transition is painful when > you have to interpret an unadorned type as "of unspecified identity-ness.") > Several years on, we're still leaning on the same few motivating examples -- > capturing things like "I might lock this" in the type system. That we haven't > come up with more killer examples is notable. And I grow increasingly > skeptical > of the value of the locking example, both because this is not how concurrent > code is written, and because we *still* have to deal with the risk of dynamic > errors because most of the world's code has not been (and will not be) written > to use IdentityObject throughout. > As Dan points out, the main thing we give up by backing off from these > interfaces is the static typing; we don't get to use `IdentityObject` as a > parameter type, return type, or type bound. And the only reason we've come up > with so far to want that is a pretty lame one -- locking. > From a language design perspective, I find that you declare a class with > `value > class`, but you express the subclassing constraint with `extends > IdentityObject`, to be pretty leaky. > On 3/22/2022 7:56 PM, Dan Smith wrote: >> In response to some encouragement from Remi, John, and others, I've decided >> to >> take a closer look at how we might approach the categorization of value and >> identity classes without relying on the IdentityObject and ValueObject >> interfaces. >> (For background, see the thread "The interfaces IdentityObject and >> ValueObject >> must die" in January.) >> These interfaces have found a number of different uses (enumerated below), >> while >> mostly leaning on the existing functionality of interfaces, so there's a >> pretty >> good complexity vs. benefit trade-off. But their use has some rough edges, >> and >> inserting them everywhere has a nontrivial compatibility impact. Can we do >> better? >> Language proposal: >> - A "value class" is any class whose instances a
Re: Alternative to IdentityObject & ValueObject interfaces
Thanks Dan for putting the work in to provide a credible alternative. Let me add some background for how we came up with these things. At some point we asked ourselves, what if we had identity and value classes from day 1? How would that affect the object model? And we concluded at the time that we probably wouldn't want the identity-indeterminacy of Object, but instead would want something like abstract class Object class IdentityObject extends Object { } abstract class ValueObject extends Object { } So the {Identity,Value}Object interfaces seemed valuable pedagogically, in that they make the object hierarchy reflect the language division. At the time, we imagined there might be methods that apply to all value objects, that could live in ValueObject. A separate factor is that we were taking operations that were previously total (locking, weak refs) and making them partial. This is scary! So we wanted a way to make these expressible in the static type system. Unfortunately, the interfaces do not really deliver on either goal, because we can't turn back time. We still have to deal with `new Object()`, so we can't (yet) make Object abstract. Many signatures will not be changeable from "Object" to "IdentityObject" for reasons of compatibility, unless we make IdentityObject erase to Object (which has its own problems.) If people use it at all for type bounds, we'll see lots of uses of `Foo`, which will put more pressure on our weak support for intersection types. And dynamic errors will still happen, because too much of the world was built using signatures that don't express identity-ness. (Kevin will see a parallel to introducing nullness annotations; it might be fine if you build the world that way from scratch, but the transition is painful when you have to interpret an unadorned type as "of unspecified identity-ness.") Several years on, we're still leaning on the same few motivating examples -- capturing things like "I might lock this" in the type system. That we haven't come up with more killer examples is notable. And I grow increasingly skeptical of the value of the locking example, both because this is not how concurrent code is written, and because we *still* have to deal with the risk of dynamic errors because most of the world's code has not been (and will not be) written to use IdentityObject throughout. As Dan points out, the main thing we give up by backing off from these interfaces is the static typing; we don't get to use `IdentityObject` as a parameter type, return type, or type bound. And the only reason we've come up with so far to want that is a pretty lame one -- locking. From a language design perspective, I find that you declare a class with `value class`, but you express the subclassing constraint with `extends IdentityObject`, to be pretty leaky. On 3/22/2022 7:56 PM, Dan Smith wrote: In response to some encouragement from Remi, John, and others, I've decided to take a closer look at how we might approach the categorization of value and identity classes without relying on the IdentityObject and ValueObject interfaces. (For background, see the thread "The interfaces IdentityObject and ValueObject must die" in January.) These interfaces have found a number of different uses (enumerated below), while mostly leaning on the existing functionality of interfaces, so there's a pretty good complexity vs. benefit trade-off. But their use has some rough edges, and inserting them everywhere has a nontrivial compatibility impact. Can we do better? Language proposal: - A "value class" is any class whose instances are all value objects. An "identity class" is any class whose instances are all identity objects. Abstract classes can be value classes or identity classes, or neither. Interfaces can be "value interfaces" or "identity interfaces", or neither. - A class/interface can be designated a value class with the 'value' modifier. value class Foo {} abstract value class Bar {} value interface Baz {} value record Rec(int x) {} A class/interface can be designated an identity class with the 'identity' modifier. identity class Foo {} abstract identity class Bar {} identity interface Baz {} identity record Rec(int x) {} - Concrete classes with neither modifier are implicitly 'identity'; abstract classes with neither modifier, but with certain identity-dependent features (instance fields, initializers, synchronized methods, ...) are implicitly 'identity' (possibly with a warning). Other abstract classes and interfaces are fine being neither (thus supporting both kinds of subclasses). - The properties are inherited: if you extend a value class/interface, you are a value/class interface. (Same for identity classes/interfaces.) It's an error to be both. - The usual restrictions apply to value classes, both concrete and abstract; and also to "neither" abstract classes, if they haven't been implicitly made
Re: Alternative to IdentityObject & ValueObject interfaces
On 22/03/2022 23:56, Dan Smith wrote: Other abstract classes and interfaces are fine being neither (thus supporting both kinds of subclasses). I feel that for such a proposal to be really useful (but that's true for the interface-based approach as well IMHO), you need a way for the _use site_ to attach an identity vs. value annotation to types that can feature both polarities (Object, interfaces, value-compatible abstract classes). It's perfectly fine to have identity vs. non-identity as a declaration property, for the cases whether that works. E.g. an ArrayList instance will always have identity. An instance of a `value class Point` will always be identity-less. Using modifiers vs. marker interfaces here is mostly an isomorphic move (and I agree that adding modifiers has less impact w.r.t. compatibility). But it feels like both interfaces and decl-site modifiers fall short of having a consistent story for the "neither" case. I feel we'd like programmers to be able to say things like: ``` class Foo { identity Object lock; void runAction(identity Runnable action) { ... } } ``` So, I believe the modifier idea has better potential than marker interfaces, because it scales at the use site in ways that marker interfaces can't (unless we allow intersection types in declarations). But of course I get that adding a new use-site modifier (like `final`) is also not to be taken lightly; aside from grammar conundrums, as you say it will have to be tracked by the type system. Stepping back, you list 4 use cases: - Dynamic detection - Subclass restriction - Variable types - Type variable bounds IMHO, they are not equally important. And once you give up on "variable types" (as explained above, I believe this use case is not adequately covered by any proposal I've seen), then there's a question of how much incremental value the other use cases add. Dynamic detection can be added cheaply, fine. I also think that, especially in the context of universal generics, we do need a way to say: "this type variable is legacy/identity only" - but that can also be done quite cheaply. IMHO, restricting subclasses doesn't buy much, if you then don't have an adequate way to restrict type declarations at the use sites (for those things that cannot be restricted at the declaration), so I'd be also tempted to leave that use case alone as YAGNI (by teaching developers that synchronizing on Object and interface types is wrong, as we've been already trying to do). P.S. While writing this, a question came up: let's say I have a generic class like this: ``` class IdentityBox { ... } ``` Is IdentityBox a well-formed parameterized type? Based on your description I'm not sure: Runnable has the "neither" polarity, but T expects "identity". With marker interfaces this will just not work. With modifiers we could perhaps allow with unchecked warning? I think it's important that the above type remains legal: I'd expect users to mark their type-variables as "identity" in cases where they just can't migrate the class implementation to support universal type variables. But if that decision results in a source incompatible change (w.r.t. existing parameterizations such as IdentityBox), then it doesn't look like a great story migration-wise. Maurizio
Re: Alternative to IdentityObject & ValueObject interfaces
On Mar 22, 2022, at 7:44 PM, Kevin Bourrillion mailto:kev...@google.com>> wrote: On Tue, Mar 22, 2022 at 4:56 PM Dan Smith mailto:daniel.sm...@oracle.com>> wrote: In response to some encouragement from Remi, John, and others, I've decided to take a closer look at how we might approach the categorization of value and identity classes without relying on the IdentityObject and ValueObject interfaces. (For background, see the thread "The interfaces IdentityObject and ValueObject must die" in January.) Could anyone summarize the strongest version of the argument against them? The thread is not too easy to follow. I'm sure there's more, but here's my sense of the notable problems with the status quo approach: - We're adding a marker interface to every concrete class in the Java universe. Generally, an extra marker interface shouldn't affect anything, but the Java universe is big, and we're bound to break some things (specifically by changing reflection behavior and by producing more compile-time intersection types). We can ask people to fix their code and make fewer assumptions, but it adds upgrade friction, and the budget for breaking stuff is not unlimited. - Injecting superinterfaces is something entirely new that I think JVMs would really rather not be involved with. But it's necessary for compatibly evolving class files. We've spent a surprising amount of time working out exactly when the interfaces should be injected; separate compilation leads to tricky corner cases. - There's a tension between our use of modifiers and our use of interfaces. We've made some ad hoc choices about which are used in which places (e.g., you can't declare a concrete value class by saying 'class Foo implements ValueObject'). In the JVM, we need modifiers for format checking and interfaces for types. This is all okay, but the arbitrariness and redundancy of it is unsatisfying and suggests there might be some accidental complexity. - Subclass restriction: 'implements IdentityObject' has been replaced with the 'identity' modifier. Complexity cost of special modifiers seems on par with the complexity of special rules for inferring and checking the superinterfaces. The rules for the modifiers are okay. But here's my observation. The simplest way to explain those rules would be if the `value` keyword is literally shorthand for `extends/implements ValueObject`. I think the rules fall out from that, plus: * IO and VO are disjoint. (As interfaces can already be, like `interface Foo { int x(); }` and `interface Bar { boolean x(); }`, and if it really came down to it, you could literally put an incompatible method into each type and blame their noncohabitation on that :-)) * A class that breaks the value class rules has committed to being an identity class. * We wouldn't know how to make an instance that is "neither", so instantiating a "neither" class has to have default behavior, and that has to be to give you what it always has. In each case I've explained why the rule seems very easy to understand to me. So from my POV, this still pulls me back to the types anyway. I would say that your rules for the modifiers are largely simulating those types. Yes, it is nice how we get inheritance for free from interfaces. But when you compare that with the "plus" list (which I'd summarize as: disjointedness, declaration restrictions, and inference), it's not like getting inheritance "for free" is such a huge win. It's maybe 20% less complexity or something to explain the feature. Of course the big win is that interfaces are types, so we already know how to use them in the static type system. As your later comments suggest, I think our expectations for static typing are probably the most important factor in deciding which strategy best meets our needs.
Re: Alternative to IdentityObject & ValueObject interfaces
On Mar 22, 2022, at 7:21 PM, Dan Heidinga mailto:heidi...@redhat.com>> wrote: A couple of comments on the encoding and questions related to descriptors. JVM proposal: - Same conceptual framework. - Classes can be ACC_VALUE, ACC_IDENTITY, or neither. - Legacy-version classes are implicitly ACC_IDENTITY. Legacy interfaces are not. Optionally, modern-version concrete classes are also implicitly ACC_IDENTITY. Maybe this is too clever, but if we added ACC_VALUE and ACC_NEITHER bits, then any class without one of the bits set (including all the legacy classes) are identity classes. (Trying out this alternative approach to abstract classes: there's no more ACC_PERMITS_VALUE; instead, legacy-version abstract classes are automatically ACC_IDENTITY, and modern-version abstract classes permit value subclasses unless they opt out with ACC_IDENTITY. It's the bytecode generator's responsibility to set these flags appropriately. Conceptually cleaner, maybe too risky...) With the "clever" encoding, every class is implicitly identity unless it sets ACC_VALUE or ACC_NEITHER and bytecode generators have to explicitly flag modern abstract classes. This is kind of growing on me. A problem is that interfaces are ACC_NEITHER by default, not ACC_IDENTITY. Abstract classes and interfaces have to get two different behaviors based on the same 0 bits. Here's another more stable encoding, though, that feels less fiddly to me than what I originally wrote: ACC_VALUE means "allows value object instances" ACC_IDENTITY means "allows identity object instances" If you set *both*, you're a "neither" class/interface. (That is, you allow both kinds of instances.) If you set *none*, you get the default/legacy behavior implicitly: classes are ACC_IDENTITY only, interfaces are ACC_IDENTITY & ACC_VALUE.
Re: Alternative to IdentityObject & ValueObject interfaces
On Tue, Mar 22, 2022 at 4:56 PM Dan Smith wrote: In response to some encouragement from Remi, John, and others, I've decided > to take a closer look at how we might approach the categorization of value > and identity classes without relying on the IdentityObject and ValueObject > interfaces. > > (For background, see the thread "The interfaces IdentityObject and > ValueObject must die" in January.) > Could anyone summarize the strongest version of the argument against them? The thread is not too easy to follow. - A "value class" is any class whose instances are all value objects. An > "identity class" is any class whose instances are all identity objects. I assume you are contrasting "bucket 1" vs. "buckets 2+3" here. (My own chosen nomenclature would only alter it slightly to say that value classes *also* have instances that are pure values, no object in sight.) - Subclass restriction: 'implements IdentityObject' has been replaced with > the 'identity' modifier. Complexity cost of special modifiers seems on par > with the complexity of special rules for inferring and checking the > superinterfaces. The rules for the modifiers are okay. But here's my observation. The simplest way to explain those rules would be if the `value` keyword is literally shorthand for `extends/implements ValueObject`. I think the rules fall out from that, plus: - IO and VO are disjoint. (As interfaces can *already* be, like `interface Foo { int x(); }` and `interface Bar { boolean x(); }`, and if it really came down to it, you could literally put an incompatible method into each type and blame their noncohabitation on that :-)) - A class that breaks the value class rules has committed to being an identity class. - We wouldn't know how to make an *instance* that is "neither", so *instantiating* a "neither" class has to have default behavior, and that has to be to give you what it always has. In each case I've explained why the rule seems very easy to understand to me. So from my POV, this *still* pulls me back to the types anyway. I would say that your rules for the modifiers are largely *simulating* those types. I think it's a win that we use the 'value' modifier and "value" terminology > for all kinds of classes/interfaces, not just concrete classes. > I think I've probably come around to that terminology over the long course of reediting this email. - Variable types: I don't see a good way to get the equivalent of an > 'IdentityObject' type. It would involve tracking the 'identity' property > through the whole type system, which seems like a huge burden for the > occasional "I'm not sure you can lock on that" error message. So we'd > probably need to be okay letting that go. Fortunately, I'm not sure it's a > great loss—lots of code today seems happy using 'Object' when it means, > informally, "object that I've created for the sole purpose of locking". > I'm confused, because it seems like we'd be throwing out an awful lot here. If I pass a value object to `identityHashCode` we'd rather that didn't compile. Seems like this list goes on a long way. - Type variable bounds: this one seems more achievable, by using the > 'value' and 'identity' keywords to indicate a new kind of bounds check > (''). Again, it's added complexity, but it's > more localized. We should think more about the use cases, and decide if it > passes the cost/benefit analysis. If not, nothing else depends on this, so > it could be dropped. (Or left to a future, more general feature?) > Don't we already need `Foo` though? Adding this too seems *super* confusing to me. Let types do what types already do. > - Documentation: we've lost the handy javadoc location to put some > explanations about identity & value objects in a place that curious > programmers can easily stumble on. Anything we want to say needs to go in > JLS/JVMS (or perhaps the java.lang.Object javadoc). > Going beyond "mere" documentation: capturing capabilities and constraints is precisely what types are *for*. Isn't being able to determine those behaviors from types the reason people choose a strongly typed language in the first place? - Compatibility: pretty clear win here. No interface injection means tools > that depend on reflection results won't be broken. (We've found a > significant number of these problems in our own code/tests, FWIW.) No new > static types means inference results won't change. > Seems less "breaking compatibility" than just "Hyrum's Law". But I lack understanding of how widespread or hard to fix these problems are. We could maybe do an experiment over here if necessary. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kev...@google.com
Re: Alternative to IdentityObject & ValueObject interfaces
A couple of comments on the encoding and questions related to descriptors. > > JVM proposal: > > - Same conceptual framework. > > - Classes can be ACC_VALUE, ACC_IDENTITY, or neither. > > - Legacy-version classes are implicitly ACC_IDENTITY. Legacy interfaces are > not. Optionally, modern-version concrete classes are also implicitly > ACC_IDENTITY. Maybe this is too clever, but if we added ACC_VALUE and ACC_NEITHER bits, then any class without one of the bits set (including all the legacy classes) are identity classes. > > (Trying out this alternative approach to abstract classes: there's no more > ACC_PERMITS_VALUE; instead, legacy-version abstract classes are automatically > ACC_IDENTITY, and modern-version abstract classes permit value subclasses > unless they opt out with ACC_IDENTITY. It's the bytecode generator's > responsibility to set these flags appropriately. Conceptually cleaner, maybe > too risky...) With the "clever" encoding, every class is implicitly identity unless it sets ACC_VALUE or ACC_NEITHER and bytecode generators have to explicitly flag modern abstract classes. This is kind of growing on me. > > - At class load time, we inherit value/identity-ness and check for conflicts. > It's okay to have neither flag set but inherit the property from one of your > supers. We also enforce constraints on value classes and "neither" abstract > classes. > > --- > > So how does this score as a replacement for the list of features enabled by > the interfaces? > > - Dynamic detection: 'obj instanceof ValueObject' is quite straightforward; > if we can replace that with 'obj.isValueObject()', that feels about equally > useful. (I'd be more pessimistic about something like > 'Objects.isValueObject(obj)'.) > > - Subclass restriction: 'implements IdentityObject' has been replaced with > the 'identity' modifier. Complexity cost of special modifiers seems on par > with the complexity of special rules for inferring and checking the > superinterfaces. I think it's a win that we use the 'value' modifier and > "value" terminology for all kinds of classes/interfaces, not just concrete > classes. > > - Variable types: I don't see a good way to get the equivalent of an > 'IdentityObject' type. It would involve tracking the 'identity' property > through the whole type system, which seems like a huge burden for the > occasional "I'm not sure you can lock on that" error message. So we'd > probably need to be okay letting that go. Fortunately, I'm not sure it's a > great loss—lots of code today seems happy using 'Object' when it means, > informally, "object that I've created for the sole purpose of locking". Do method parameters also fall under this case? To pick on our favourite example, how would we use the type system to adapt the descriptor for WeakReference:: so it wouldn't accept Values? Are we willing to give up on putting the constraints in the type system given that interfaces are checked on use (ie: invokeinterface) rather than by the verifier? > > - Type variable bounds: this one seems more achievable, by using the 'value' > and 'identity' keywords to indicate a new kind of bounds check (' extends Runnable>'). Again, it's added complexity, but it's more localized. > We should think more about the use cases, and decide if it passes the > cost/benefit analysis. If not, nothing else depends on this, so it could be > dropped. (Or left to a future, more general feature?) Similar to the question above, how would this be reflected in the method descriptor? `` would erase to Runnable? If it doesn't get expressed somewhere that the VM can take action on it, I think it will give developers a false sense of security regarding the method's invariants and encourage them to skip the `!obj.isValueObject()` checks in these methods. > > - Documentation: we've lost the handy javadoc location to put some > explanations about identity & value objects in a place that curious > programmers can easily stumble on. Anything we want to say needs to go in > JLS/JVMS (or perhaps the java.lang.Object javadoc). > > - Compatibility: pretty clear win here. No interface injection means tools > that depend on reflection results won't be broken. (We've found a significant > number of these problems in our own code/tests, FWIW.) No new static types > means inference results won't change. There's less risk of incompatibilities > when adding/removing the 'identity' and 'value' keywords (although there can > still be source, binary, and behavioral incompatibilities). > --Dan