Revisiting default values

2020-07-10 Thread Dan Smith
Brian pointed out that my list of candidate inline classes in the Identity 
Warnings JEP (JDK-8249100) includes a number of classes that, despite being 
"value-based classes" and disavowing their identity, might not end up as inline 
classes. The problem? Default values.

This might be a good time to revisit the open design issues surrounding default 
values and see if we can make some progress.

Background/status quo: every inline class has a default instance, which 
provides the initial value of fields and array components that have the inline 
type (e.g., in 'new Point[10]'). It's also the prototype instance used to 
create all other instances (start with 'vdefault', then apply 'withfield' as 
needed). The default value is, by fiat, the class instance produced by setting 
all fields to *their* default values. Often, but not always, this means 
field/array initialization amounts to setting all the bits to 0. Importantly, 
no user code is involved in creating a default instance.

Real code is always useful for grounding design discussions, so let's start 
there. Among the classes I listed as inline class candidates, we can put them 
in three buckets:

Bucket #1: Have a reasonable default, as declared.
- wrapper classes (the primitive zeros)
- Optional & friends (empty)
- From java.time: Instant (start of 1970-01-01), LocalTime (midnight), Duration 
(0s), Period (0d), Year (1 BC, if that's acceptable)

Bucket #2: Could have a reasonable default after re-interpreting fields.
- From java.time: LocalDate, YearMonth, MonthDay, LocalDateTime, ZonedDateTime, 
OffsetTime, OffsetDateTime, ZoneOffset, ZoneRegion, MinguoDate, HijrahDate, 
JapaneseDate, ThaiBuddhistDate (months and days should be nonzero; null 
Strings, ZoneIds, HijrahChronologies, and JapaneseEras require special handling)
- ListN, SetN, MapN (null array interpreted as empty)

Bucket #3: No good default.
- Runtime.Version (need a non-null List)
- ProcessHandleImpl (need a valid process ID)
- List12, Set12, Map1 (need a non-null value)
- All ConstantDesc implementations (need real class & method names, etc.)

There's some subjectivity between the 2nd and 3rd buckets, but the idea behind 
the 2nd is that, with some translation layer between physical fields and 
interpretation of those fields, we can come up with an intuitive default (e.g., 
"0 means January"; "a null String means time zone 'UTC'"). In contrast, in the 
third bucket, any attempt to define a default value is going to be pretty 
unintuitive ("A null method name means 'toString'").

The question here is how much work the JVM and language are willing to do, or 
how much work we're willing to ask clients to do, in order to support use cases 
that don't fall into Bucket #1.

I don't think totally excluding Buckets #2 and #3 is a very good outcome. It 
means that, in many cases, inline classes need to be built up exclusively from 
primitives or other inline types, because if you use reference types, your 
default value will have a null field. (Sometimes, as in Optional, null fields 
have straightforward interpretations, but most of the time programs are 
designed to prevent them.)

Whether we support Bucket #2 but not Bucket #3 is a harder question. It 
wouldn't be so bad if none of the examples above in Bucket #3 become inline 
classes—for the most part they're handled via interfaces, anyway. 
(Counterpoint: inline class instances that are immediately typed with interface 
types still potentially provide a performance boost.) But I'm also not sure 
this is representative. We've noted before that many use cases, like database 
records or data structure cursors, don't have meaningful defaults (what's a 
default mailing address?). The ConstantDesc classes really illustrate this, 
even though they happen to not be public.

Another observation is that if we support Bucket #3 but not Bucket #2, that's 
probably not a big deal—I'm not sure anybody really *wants* to deal with the 
default instance; it's just the price you pay for being an inline class. If 
there's a way to opt out of that extra weirdness and move from Bucket #2 to 
Bucket #3, great.

With that discussion in mind, here are some summaries of approaches we've 
considered, or that I think we ought to consider, for supporting buckets #2 and 
#3. (This is as best as I recall. If there's something I've missed, add it to 
the list!)

[Weighing in for myself: my current preference is to do one of F, G, or I. I'm 
not that interested in supporting Bucket #2, for reasons given above, although 
Option A works for programmers who really want it.]



=== Solutions to support Bucket #2 ===

Two broad strategies here: re-interpreting fields (A, B), and re-interpreting 
the default instance (C, D).

---

Option A: Encourage programmers to re-interpret fields

Guidance to programmers: when you declare an inline class, identify any fields 
for which the default instance should hold something other than zero/null; 
define a mapping for your implementat

Re: Revisiting default values

2020-07-10 Thread Dan Smith
> On Jul 10, 2020, at 12:46 PM, Kevin Bourrillion  wrote:
> 
> My reason for complaining here is not just about the java.time types 
> themselves, but to argue that this is an important 4th bucket we should be 
> concerned about. In some ways it is a bigger problem that Bucket #3 "no good 
> default", since it is an actively harmful default.
> 
> For all of these types, there is one really fantastic default value that does 
> everything you would want it to do: null. That is why these types should not 
> become inline types, or certainly not val-default inline types, and why Error 
> Prone will have to ban usage of `.val` if they do.

Appreciate the thoughts, this is definitely relevant.

For the purpose of this discussion, I'd say you're arguing for these classes to 
move to Bucket #3. Because then the question becomes, just like for the other 
classes there: do we use the Bucket #3 strategies to support these as inline 
classes, or do we give up and leave them as identity classes?

Re: Revisiting default values

2020-07-13 Thread Dan Smith
From valhalla-spec-observers:

> On Jul 12, 2020, at 10:45 PM, Zheka Kozlov  wrote:
> 
> Sorry for a probably stupid question but aren't all classes from Bucket #2 
> and #3 ref-default? Which means when we are calling new LocalDate[10], all 
> elements of the array are initialized to null. And since the constructors of 
> these classes are private, the external user will never see the instances in 
> their default state.

True, 'new LocalDate[10]' will continue to allocate an array of nulls. The 
default instance is only relevant when someone does 'new LocalDate.val[10]'.

Regardless of the syntax, if there exists an inline type for instances of an 
inline class ('LocalDate.val' above), there will also be a semantic question of 
how we initialize fields/arrays of that inline type.

Re: Revisiting default values

2020-07-13 Thread Dan Smith
> On Jul 10, 2020, at 12:23 PM, Dan Smith  wrote:
> 
> Option G: JVM support for default instance guards
> 
> Inline class files can indicate that their default instance is invalid. All 
> attempts to operate on that instance (via field/method accesses, other than 
> 'withfield') result in an exception.
> 
> This tightens up Option F, making it just as impossible to access members of 
> the default instance as it is to access members of 'null'.
> 
> Same issues as Option E regarding adding a "new NPE" to the platform.

There's a variant of this that deserves spelling out:

---

Option J: JVM treats default instance as 'null'

Like Option G, an inline class file can indicate that its default instance is 
invalid—in this case, 'null'. All attempts to operate on that instance result 
in an NPE. Conceptually, the null instance and the null reference are the same, 
and should generally be indistinguishable.

(We explored this awhile back as a tool for migration, before going in a 
different direction.)

Some implications:

- The VM probably wants to normalize its encodings (null reference vs. null 
instance), meaning there's a translation layer on field/array reads, just like 
Option I, and also for field/array writes, just like Option D.

- Casts to Q types for certain classes should also translate from null 
reference to null instance, rather than NPE.

- For these classes, the 'withfield' instruction is uniquely able to operate on 
and produce 'null'.

- In the language, the 'null' literal can be assigned to some inline types. (In 
the VM, the verifier could require using 'defaultvalue' instead, if it wants to 
avoid some class loading.)

- We could revisit the question of whether it's possible to migrate an identity 
class to be an inline-default inline class as long as the default instance is 
'null'. (There are additional issues, like binary compatibility. But we could 
we-open that exploration...)

---

My sense is that Option I dominates Option J by most measures—it achieves the 
same result (default value is invalid), with less work at flattened storage 
barriers, fewer tweaks to the rest of the system, and a more useful programming 
model (no nulls being passed around).



Re: Revisiting default values

2020-07-13 Thread Dan Smith



> On Jul 13, 2020, at 12:19 PM, Dan Smith  wrote:
> 
>> On Jul 10, 2020, at 12:23 PM, Dan Smith  wrote:
>> 
>> Option G: JVM support for default instance guards
>> 
>> Inline class files can indicate that their default instance is invalid. All 
>> attempts to operate on that instance (via field/method accesses, other than 
>> 'withfield') result in an exception.
>> 
>> This tightens up Option F, making it just as impossible to access members of 
>> the default instance as it is to access members of 'null'.
>> 
>> Same issues as Option E regarding adding a "new NPE" to the platform.
> 
> There's a variant of this that deserves spelling out:
> 
> ---
> 
> Option J: JVM treats default instance as 'null'
> 
> Like Option G, an inline class file can indicate that its default instance is 
> invalid—in this case, 'null'. All attempts to operate on that instance result 
> in an NPE. Conceptually, the null instance and the null reference are the 
> same, and should generally be indistinguishable.
> 
> (We explored this awhile back as a tool for migration, before going in a 
> different direction.)
> 
> Some implications:
> 
> - The VM probably wants to normalize its encodings (null reference vs. null 
> instance), meaning there's a translation layer on field/array reads, just 
> like Option I, and also for field/array writes, just like Option D.
> 
> - Casts to Q types for certain classes should also translate from null 
> reference to null instance, rather than NPE.
> 
> - For these classes, the 'withfield' instruction is uniquely able to operate 
> on and produce 'null'.
> 
> - In the language, the 'null' literal can be assigned to some inline types. 
> (In the VM, the verifier could require using 'defaultvalue' instead, if it 
> wants to avoid some class loading.)
> 
> - We could revisit the question of whether it's possible to migrate an 
> identity class to be an inline-default inline class as long as the default 
> instance is 'null'. (There are additional issues, like binary compatibility. 
> But we could we-open that exploration...)
> 
> ---
> 
> My sense is that Option I dominates Option J by most measures—it achieves the 
> same result (default value is invalid), with less work at flattened storage 
> barriers, fewer tweaks to the rest of the system, and a more useful 
> programming model (no nulls being passed around).

And here's another option that has been previously discarded, but might be 
worth picking back up. This one to address Bucket #2:

---

Option K: JVM initializes fields/arrays to a designated default

The VM allows inline classes to designate a logical default instance, and 
during class preparation or array allocation, any fields/components of the 
inline type are initialized to the logical default.

Compare to Option D. Rather than adding barriers to reads/writes that interact 
with the storage, we simply initialize the storage "properly" in the first 
place.

The possibly-fatal downside is that it means every array allocation for that 
inline type has to stamp out a bunch of copies of a particular bit pattern, 
rather than the simpler all-zeros pattern. But that extra cost may be worth it 
in exchange for faster reads/writes to the array. (Same comments for class 
instances, although I don't think it's as much of a concern, given the 
relatively small sizes of class instances.)

Note that some arrays *already* have to stamp out a nonzero bit pattern, if the 
encoding of an inline type uses pointers rather than flattened fields (e.g., 
for an inline class with too many fields).

---

If we're enthusiastic about addressing Bucket #2, this seems like a viable 
approach—quite simple, and with comparable performance to most of the other 
approaches.



Re: Revisiting default values

2020-07-14 Thread Peter Levart
What about a variant of G or J where an inline class would designate a 
single field to be used for "isDefault" checks. Instead of comparing all 
fields for "zero" value, a single designated field would be used in 
checks. So a class is free to choose which of the existing fields is 
"never zero/null" in the set of valid class states or can even add a 
special-purpose (boolean) field to be used just for that. Often no such 
special field would need to be added.


WDYT?

Peter

On 7/13/20 8:19 PM, Dan Smith wrote:

On Jul 10, 2020, at 12:23 PM, Dan Smith  wrote:

Option G: JVM support for default instance guards

Inline class files can indicate that their default instance is invalid. All 
attempts to operate on that instance (via field/method accesses, other than 
'withfield') result in an exception.

This tightens up Option F, making it just as impossible to access members of 
the default instance as it is to access members of 'null'.

Same issues as Option E regarding adding a "new NPE" to the platform.

There's a variant of this that deserves spelling out:

---

Option J: JVM treats default instance as 'null'

Like Option G, an inline class file can indicate that its default instance is 
invalid—in this case, 'null'. All attempts to operate on that instance result 
in an NPE. Conceptually, the null instance and the null reference are the same, 
and should generally be indistinguishable.

(We explored this awhile back as a tool for migration, before going in a 
different direction.)

Some implications:

- The VM probably wants to normalize its encodings (null reference vs. null 
instance), meaning there's a translation layer on field/array reads, just like 
Option I, and also for field/array writes, just like Option D.

- Casts to Q types for certain classes should also translate from null 
reference to null instance, rather than NPE.

- For these classes, the 'withfield' instruction is uniquely able to operate on 
and produce 'null'.

- In the language, the 'null' literal can be assigned to some inline types. (In 
the VM, the verifier could require using 'defaultvalue' instead, if it wants to 
avoid some class loading.)

- We could revisit the question of whether it's possible to migrate an identity 
class to be an inline-default inline class as long as the default instance is 
'null'. (There are additional issues, like binary compatibility. But we could 
we-open that exploration...)

---

My sense is that Option I dominates Option J by most measures—it achieves the 
same result (default value is invalid), with less work at flattened storage 
barriers, fewer tweaks to the rest of the system, and a more useful programming 
model (no nulls being passed around).





Re: Revisiting default values

2020-07-14 Thread Dan Smith
> On Jul 14, 2020, at 6:39 AM, Peter Levart  wrote:
> 
> What about a variant of G or J where an inline class would designate a single 
> field to be used for "isDefault" checks. Instead of comparing all fields for 
> "zero" value, a single designated field would be used in checks. So a class 
> is free to choose which of the existing fields is "never zero/null" in the 
> set of valid class states or can even add a special-purpose (boolean) field 
> to be used just for that. Often no such special field would need to be added.
> 
> WDYT?

This is probably more fine-grained than I want to get into right now—let's 
choose a direction before drilling down on how we can make it fast—but, yes, in 
previous discussions we have considered using a designated field as the 
'isDefault' signal, rather than doing a full 'val == Foo.default'. I don't know 
whether that's likely to be a worthwhile optimization or not.

Re: Revisiting default values

2020-07-15 Thread Remi Forax
So the default value may a valid value or may be an invalid value,
if it's an invalid value it should be the author of the class that say that 
because in Java we prefer declaration site to use site.

One way is to try to teach the VM how to do the conversions, i want to explore 
another way where we try to solve that issue at the language level, to avoid to 
have a more complex VM.

A default value which is invalid should behave like null, i.e. calling any 
methods on the default value should result in an exception.
Doing that at the language level means adding a check before calling any 
instance methods and before accessing any instance fields.

So there are two parts to solve,
1/ how to specify the check, is it just this == Inline.default or is it 
whatever the user want (or something in the middle, like a field check)
2/ how to execute that check when accessing a field or a method ?

Let explore the solution that offers the maximum freedom for the author of the 
inline class, i.e. for 1/, the check is user defined.
For that we can introduce a new kind of initializer, like the static block, 
let's call it the invariant block
  inline class Foo {
private final Object o;

invariant {
  if (o == null) {
throw new InvalidFooException();
  }
}
  }
this invariant block is translated into a method (that has the name  
see later why) and is called each time a method or a field is accessed.

For 2/, we can either change the spec of the VM so the invariant block is 
called automatically by the VM or we can use invokedynamic.
invokedynamic has the advantage of not requiring more VM support at the expanse 
of the bootstrap issue. 

The main issue with invokedynamic is that it's not a backward compatible change 
because it requires to change the call sites.
So we can lessen the requirement like this, requiring only the call to 
 when accessing an instance method because
we suppose that people will not be foolish enough to declare the fields public,
In that case, there is no need for using invokedynamic because a call to the 
invariant method can be inserted by the compiler at the beginning of any 
instance method.
This solution also has the advantage of lowering the cost at runtime compared 
to using invokedynamic.

In term of performance, i believe the language spec should say that the 
invariant block has to be idempotent.
Because in that case, the VM is free to not execute several calls to the 
 method once one is executed on a specific instance
(like the JITs do nullchecks collapsing currently).

To summarize, i believe we should allow more value based classes to be 
retrofitted as inline class by adding the concept of invariant block to the 
Java language spec.
An invariant block being a simple idempotent method called at the beginning of 
every instance methods.  

Rémi

- Mail original -
> De: "daniel smith" 
> À: "valhalla-spec-experts" 
> Envoyé: Vendredi 10 Juillet 2020 20:23:25
> Objet: Revisiting default values

> Brian pointed out that my list of candidate inline classes in the Identity
> Warnings JEP (JDK-8249100) includes a number of classes that, despite being
> "value-based classes" and disavowing their identity, might not end up as 
> inline
> classes. The problem? Default values.
> 
> This might be a good time to revisit the open design issues surrounding 
> default
> values and see if we can make some progress.
> 
> Background/status quo: every inline class has a default instance, which 
> provides
> the initial value of fields and array components that have the inline type
> (e.g., in 'new Point[10]'). It's also the prototype instance used to create 
> all
> other instances (start with 'vdefault', then apply 'withfield' as needed). The
> default value is, by fiat, the class instance produced by setting all fields 
> to
> *their* default values. Often, but not always, this means field/array
> initialization amounts to setting all the bits to 0. Importantly, no user code
> is involved in creating a default instance.
> 
> Real code is always useful for grounding design discussions, so let's start
> there. Among the classes I listed as inline class candidates, we can put them
> in three buckets:
> 
> Bucket #1: Have a reasonable default, as declared.
> - wrapper classes (the primitive zeros)
> - Optional & friends (empty)
> - From java.time: Instant (start of 1970-01-01), LocalTime (midnight), 
> Duration
> (0s), Period (0d), Year (1 BC, if that's acceptable)
> 
> Bucket #2: Could have a reasonable default after re-interpreting fields.
> - From java.time: LocalDate, YearMonth, MonthDay, LocalDateTime, 
> ZonedDateTime,
> OffsetTime, OffsetDateTime, ZoneOffset, ZoneRegion, MinguoDate, HijrahDate,
> JapaneseDate, ThaiBuddhistDate (months and da

Re: Revisiting default values

2020-07-20 Thread Brian Goetz

Responding to Kevin's tangent:

 - Of the one's on Dan's list, one could argue that even some of the 
ones in Bucket 1 are questionable, such as `char` or `Instant`.  The 
ones that really seem like slam dunks are: numerics (int, long, etc), 
boolean, and maybe Optional.  That's a small list.


(Another candidate for bucket 1: BigDecimal.)

More generally:

 - The language is schizoid about uninitialized variables.  DA analysis 
requires that we always initialize locals (even when we want to 
initialize `count` to `0`), but doesn't require it for fields.  This is 
because we know that there are windows of unfortunateness where the 
default value is still observable -- inside the ctor, or if `this` 
escapes the ctor.


> Option J: JVM treats default instance as 'null'

Implementation note: when we explored this a while back, we were 
interested in identifying a "pivot field" where the programmer committed 
(or analysis proved) that all properly initialized instances would have 
a non-default value for this field, as would be the case if any field 
had an unconditional `foo = new Foo()` assignment in the constructor.  
This makes detection of the default value much faster, since you only 
have to check the pivot field.


(Peter raises this in his "what about" query later.)

We were initially excited about this approach but later realized it was 
feeding the "optimization dopamine receptor" rather than actually 
solving a problem :)


It sounds like this debate is between `null` and a value which really 
is the /moral equivalent/ of `null`. You basically would have two 
kinds of nullability that look different from each other.


John has made an impassioned plea for "no new nulls". Accordingly, we 
did explore a variant of J where a `withfield` that set the pivot field 
to its default value _actually put a null on the stack_.  (We backed off.)



And here's another option that has been previously discarded, but might be 
worth picking back up. This one to address Bucket #2:

---

Option K: JVM initializes fields/arrays to a designated default


John has in the past pushed back on this, in part because of the problem 
identified above (can't close the window 100%, only 99.5%, and that 0.5% 
is where security bugs come from), and in part because of the 
cost/complexity in the JVM.


That said, doing so in the language is potentially more viable.  It 
would mean, for classes that opt into this treatment:


 - Ensuring that `C.default` evaluates to the right thing
- Preventing `this` from escaping the constructor (which might be a good 
thing to enforce for inline classes anyway)
 - Ensuring all fields are DA (which we do already), and that 
assignments to fields in ctors are not their default value
 - Translating `new Foo[n]` (and reflective equivalent) with something 
that initializes the array elements


The goal is to keep default instances from being observed. If we lock 
down `this` from constructors, the major cost here is instantiating 
arrays of these things, but we already optimize array initialization 
loops like this pretty well.


Overall this doesn't seem terrible.  It means that the cost of this is 
borne by the users of classes that opt into this treatment, and keeps 
the complexity out of the VM.  It does mean that "attackers" can 
generate bytecode to generate bad instances (a problem we have with 
multiple vectors today.)


Call this "L".


I'd suggest, though, we back away from implementation techniques (you've 
got a good menu going already), and focus more on "what language do we 
want to build."  You claim:


> I don't think totally excluding Buckets #2 and #3 is a very good 
outcome.


Which I think is a reasonable hypothesis, but I suggest we focus the 
discussion on whether we believe this or not, and what we might want to 
do about it (and when), first.



On 7/10/2020 2:46 PM, Kevin Bourrillion wrote:
This response is not to the main topic; not trying to send us down a 
rabbit-hole but this point is very important to me (as will be clear :-)).



On Fri, Jul 10, 2020 at 11:23 AM Dan Smith > wrote:


Bucket #1: Have a reasonable default, as declared.
- wrapper classes (the primitive zeros)
- Optional & friends (empty)
- From java.time: Instant (start of 1970-01-01), LocalTime
(midnight), Duration (0s), Period (0d), Year (1 BC, if that's
acceptable)


Duration and Period: sure.

Instant and the others: please, please put these in a separate bucket. 
They can have a /default/, but it is absolutely /not/ a "reasonable" 
default. In fact many tens (hundreds?) of thousands of bug reports in 
the last 50 years of computing have been "why in the world did 
1970-01-01 or 1969-12-31 show up on this screen??"


(Source: my team at Google has invested literally multiple 
person-years in an effort to stamp out bugs with how users use 
java.time, which I kicked off and have stayed peripherally involved 
in. I feel this should make our perspect

Re: Revisiting default values

2020-07-21 Thread Dan Smith


> On Jul 20, 2020, at 10:27 AM, Brian Goetz  wrote:
> 
> That said, doing so in the language is potentially more viable.  It would 
> mean, for classes that opt into this treatment:
> 
>  - Ensuring that `C.default` evaluates to the right thing
>  - Preventing `this` from escaping the constructor (which might be a good 
> thing to enforce for inline classes anyway)
>  - Ensuring all fields are DA (which we do already), and that assignments to 
> fields in ctors are not their default value 
>  - Translating `new Foo[n]` (and reflective equivalent) with something that 
> initializes the array elements
> 
> The goal is to keep default instances from being observed.  If we lock down 
> `this` from constructors, the major cost here is instantiating arrays of 
> these things, but we already optimize array initialization loops like this 
> pretty well.  
> 
> Overall this doesn't seem terrible.  It means that the cost of this is borne 
> by the users of classes that opt into this treatment, and keeps the 
> complexity out of the VM.  It does mean that "attackers" can generate 
> bytecode to generate bad instances (a problem we have with multiple vectors 
> today.)  
> 
> Call this "L".  

More letters!

Expanding on ways to support Bucket #3 by ensuring initialization of 
fields/arrays:

---

Option L: Language requires field/array initialization

An inline class may be declared to have no default. Fields and arrays of that 
class's inline type must be provably initialized (via compiler analysis) before 
they are read or published.

Instance fields of the class's inline type must be initialized before a method 
call involving 'this' occurs. (It's already illegal to allow the constructor to 
return before initialization.)

Static fields... seem hopeless, so maybe must have a reference type (perhaps 
implicitly). Maybe we can do an analysis that permits some very simple cases, 
but once you allow method calls of almost any sort, you've lost. (We'd have to 
prove that no initialization of *other* classes triggered by  refers to 
the field before it has been initialized.)

Arrays must be initialized at creation time, either with an array initializer 
("Address[] as = { x, y, z };") or via a trusted API ("Address[] as = 
Arrays.of(i -> x);"). We might introduce a language sugar for the trusted API 
("Address[] as = { i -> x };"). We *could* support two-stage initialization via 
things like 'Arrays.fill', but analysis to track uninitialized arrays from 
creation to filling doesn't seem worthwhile.

This is less expressive, obviously. In particular, many comfortable idioms for 
initializing an array won't work. As a case study: what happens in generic code 
like ArrayList? When it wants to allocate its array (we're in a specialized 
world where T has been specialized to 'QAddress;'), what value does it fill the 
array with? Nothing is available, because at this point the list is empty, and 
it's just allocating storage for later. I guess ArrayList (and similar data 
structures) has to have a special back door, and we're left to trust the author 
not to expose the uninitialized payload.

As with all language features, there's also the question of what happens when a 
class file doesn't conform to the language's rules. Option L can't really stand 
alone—it needs to be backed up by some other option when the language's 
guarantees fail.

---

Option M: JVM requires field/array initialization

Inline class files can indicate that their default instance is invalid. Fields 
and arrays of that class's inline type must be provably initialized (via 
verification or related analysis) before they are read or published.

All the compile-time analysis of Option L applies here, because the language 
compiler needs to be sure its generated class files are valid.

We can use some new verification types to track the initialization status of 
'this', the way we do to require 'super' calls today. You don't have a fully 
formed 'Foo', capable of being passed to other methods, etc., until all fields 
are initialized. This would also apply to 'defaultvalue' for an inline class 
with a field of a default-less inline type.

Again, static fields are hopeless, it's an error to use the inline type as a 
static field type.

'anewarray' of the inline type is illegal, except within a trusted API. That 
API promises to initialize every array component before publishing the array. 
(We won't try to guarantee this with an analysis—the API is trusted because it 
has been vetted by humans.) In addition to some standard factory methods, we 
could decide that the inline class itself is always a trusted API.

(A related approach was discussed at our last EG meeting, but with much less 
expressiveness: inline-typed fields are always illegal, and arrays can only be 
allocated by the class author.)

This closes the backdoor of other bytecode not playing by the language's rules. 
The expressiveness problems of Option L remain—e.g., ArrayList's early 
allocati

RE: Revisiting default values

2020-07-28 Thread Tobi Ajila
> Bucket #3 classes must be reference-default, and fields/arrays of their
inline type are illegal outside of the declaring class. The declaring class
can provide a flat array factory if it wants to. (A new idea from Tobi,
he'll write it up for the thread.)

```
public sealed abstract class LegacyType permits LegacyType.val { //Formerly
a concrete class, but now its abstract or maybe an interface
//factory methods
public static LegacyType makeALegacyType(...);//in some cases this
already exists
public static LegacyType[] newALegacyTypeArray(int size);//can be
flattened
}

private inline class LegacyType.val extends LegacyType { ... } //this type
is hidden, only LegacyType knows about it
```

This approach is based on what Kevin mentioned earlier, "For all of these
types, there is one really fantastic default value that does everything you
would want it to do: null. That is why these types should not become
inline-types, or certainly not val-default inline types ...". Essentially,
by making these types reference-default and by providing an avenue to
restrict the value-projection to the reference-default type, the writer
maintains control of where and when the value-projection is allowed to be
observed thus solving the bad default problem. The writer also has the
ability to supply a flattened array factory with initialized elements.

This approach is appealing for the following reasons: no additional JVM
complexity (ie. no bytecode checks for the bad default value), no javac
boilerplate (ie. guards on member access, guards on method entries, etc.).
On the other there are two big drawbacks: no instance field flattening for
these types, and creating flattened arrays is a bit unnatural since it has
to be done via a factory.

Going back to Brian's comment:

> I'd suggest, though, we back away from implementation techniques (you've
got a good menu going already), and focus more on "what language do we want
to build."  You claim:
> > I don't think totally excluding Buckets #2 and #3 is a very good
outcome.
> Which I think is a reasonable hypothesis, but I suggest we focus the
discussion on whether we believe this or not, and what we might want to do
about it (and when), first.

I think it would help if we had a clear sense as to what proportion of
inline-types we think will have this "bad default" problem. Last year when
we discussed null-default inline types the thinking was that about 75% of
the motivation for null-defaults was migrating VBC, 20% for security, 5%
for "I want null in my value set.". My assumption is that the vast majority
of inline-types will not be migrated types, they will be new types. If this
is correct then it would appear that the default value problem is really a
problem for a minority of inline-types.

All the solutions proposed have some kind of cost associated with them, and
these costs vary (ie. jvm complexity, throughput overhead, JIT compilation
time, etc.). If the default value problem is only for a minority of the
types, I would argue that the costs should be limited to types that want to
opt-in to not expose their default value or un-initialized value. How we
feel about this will determine which direction we choose to take when
exploring the solution space.

So, in short I want to second Brian's comment, I think its important to
decide if we want this kind of feature but also what we are willing to give
up to get it.

--Tobi

"valhalla-spec-experts"  wrote
on 2020/07/21 02:41:11 PM:

> From: Dan Smith 
> To: valhalla-spec-experts 
> Cc: Brian Goetz 
> Date: 2020/07/21 02:41 PM
> Subject: [EXTERNAL] Re: Revisiting default values
> Sent by: "valhalla-spec-experts"

>
>
> > On Jul 20, 2020, at 10:27 AM, Brian Goetz 
wrote:
> >
> > That said, doing so in the language is potentially more viable.
> It would mean, for classes that opt into this treatment:
> >
> >  - Ensuring that `C.default` evaluates to the right thing
> >  - Preventing `this` from escaping the constructor (which might be
> a good thing to enforce for inline classes anyway)
> >  - Ensuring all fields are DA (which we do already), and that
> assignments to fields in ctors are not their default value
> >  - Translating `new Foo[n]` (and reflective equivalent) with
> something that initializes the array elements
> >
> > The goal is to keep default instances from being observed.  If we
> lock down `this` from constructors, the major cost here is
> instantiating arrays of these things, but we already optimize array
> initialization loops like this pretty well.
> >
> > Overall this doesn't seem terrible.  It means that the cost of
> this is borne by the users of classes that opt into this treatment,
> and keeps the complexity out of the VM.  It does mean that
> &quo

Re: Revisiting default values

2020-07-28 Thread Brian Goetz


I think it would help if we had a clear sense as to what proportion of 
inline-types we think will have this "bad default" problem. Last year 
when we discussed null-default inline types the thinking was that 
about 75% of the motivation for null-defaults was migrating VBC, 20% 
for security, 5% for "I want null in my value set.". My assumption is 
that the vast majority of inline-types will not be migrated types, 
they will be new types. If this is correct then it would appear that 
the default value problem is really a problem for a minority of 
inline-types. 


Indeed, we've come up with good solutions for migrating VBCs (migrate it 
to a ref-default inline class) and "I want null in my value set" (then 
just use the ref projection.)


For the "migrate from VBC" crowd, we offer the advice: "keep using `Foo` 
(really `Foo.ref`) in your APIs, but feel free to use `Foo.val` inside 
your implementation, where you are confident of no nulls."  And further, 
we offer that advice to both the VBC author and its clients.  So, we can 
expect existing APIs to continue to return Optional, but more fields 
of type `Optional.ref`, to get the flattening, and doing null checks 
in the constructor:


    this.foo = requireNonNull(foo)

And this is one of the sources of "zero pollution"; a client may have a 
field of type `Foo.val` and just not initialize it in their constructor, 
and then later someone calls `foo.bar()`.  Unlike with a reference type, 
which would NPE in this situation, we might enter the `bar()` method, 
which might not be defensively coded to check for the (meaningless) 
default, and it will do something dumb.  Where dumb ranges from "Welcome 
to 1970" to "delete all my files."


I think what we need for Bucket 3 (which I think we agree is more 
important than Bucket 2) is to (optionally, only for NGD inline classes) 
restore parity with reference types by ensuring that the receiver of a 
method invocation is never seen to be the default value.  (We already do 
this for reference types; we NPE before the dispatch would succeed.)   
And the strategies we've been kicking around have ranged from "try to 
prevent the default from showing up in the heap" to "detect when the 
default shows at various times."


If the important point in time is method dispatch, then we can probably 
simplify to:


 - Let some classes mark themselves as NGD (no good default)
 - At the point of invocation of an NGD instance method, check the 
receiver against the default, throw NPE if it is
 - Optionally, try to optimize this check by identifying (manually or 
automatically) a pivot field


Note that even an unoptimized check is probably pretty fast already: 
"are all the bits zero."  But we can probably often optimize down to a 
single-word comparison to zero.


Note too that we can implement this check in either generated bytecode 
or in the VM; the semantics are the same, the latter is more secure.





Re: Revisiting default values

2020-07-28 Thread Dan Smith
> On Jul 28, 2020, at 11:33 AM, Tobi Ajila  wrote:
> 
> > Bucket #3 classes must be reference-default, and fields/arrays of their 
> > inline type are illegal outside of the declaring class. The declaring class 
> > can provide a flat array factory if it wants to. (A new idea from Tobi, 
> > he'll write it up for the thread.)

I've since come to see this as a variant of Option L or Option M: we apply some 
restrictions + analysis to guarantee that uninitialized fields/arrays are never 
exposed. In this case, the guarantee is easy to prove because nobody can 
declare fields/arrays at all, except the class author.

> This approach is appealing for the following reasons: no additional JVM 
> complexity (ie. no bytecode checks for the bad default value), no javac 
> boilerplate (ie. guards on member access, guards on method entries, etc.). On 
> the other there are two big drawbacks: no instance field flattening for these 
> types, and creating flattened arrays is a bit unnatural since it has to be 
> done via a factory.

The biggest problem I see with approaches that prevent use of 'anewarray' is 
that they violate our uniform bytecode design, which is crucial to 
specialization. That is: how do I allocate a flat array of T in something like 
ArrayList? I can't be calling arbitrary factory methods depending on T.

There's also a problem of exactly what these array factory methods are supposed 
to do. Sure, we can blame the author if they choose to leak garbage data 
through the factory. But... what are they going to put in the array, if not 
garbage data? This is really more of a Bucket #2 solution, where there exists 
some reasonable default to fill the array with.

> I think it would help if we had a clear sense as to what proportion of 
> inline-types we think will have this "bad default" problem. Last year when we 
> discussed null-default inline types the thinking was that about 75% of the 
> motivation for null-defaults was migrating VBC, 20% for security, 5% for "I 
> want null in my value set.". My assumption is that the vast majority of 
> inline-types will not be migrated types, they will be new types. If this is 
> correct then it would appear that the default value problem is really a 
> problem for a minority of inline-types. 

My two cents: this is not about migrated vs. new types. This is about what's 
being modeled. A certain subset of inline classes will model some sort of 
numeric quantity with a natural "zero" value. Many others—I'd predict more than 
50%, though it will depend a lot on how accommodating we are to these use 
cases—will represent non-numeric data without any "zero" analog. These will 
often wrap non-null references (strings, for example).

(Challenge: can we think of any use cases for inline classes that have a 
natural all-zeros default value *other than* a numeric zero, a singleton with 
no fields, or the equivalent of Optional.empty()? Maybe a collection of boolean 
flags? Once you've got references, it's pretty unusual to expect them to be 
null.)

Within the subset that doesn't have a good default, it's often the case that 
the class has limited exposure, and some programmers might happily trade safety 
guarantees for performance, knowing they can trust all clients (or if there's a 
bug, they'll catch it in testing). So maybe they'll be fine with the all-zeros 
default story. But any class that belongs to a public API, or even that has 
significant non-public exposure, is going to want to be confident that it's 
operating on valid data.

> I would argue that the costs should be limited to types that want to opt-in 
> to not expose their default value or un-initialized value.

Yes, agreed. Major demerits for any approach that imposes costs on programs 
that don't make use of no-default inline classes.

> I think its important to decide if we want this kind of feature but also what 
> we are willing to give up to get it.

The right way to think about it is this: there exist many classes that don't 
need identity and also don't have natural defaults. We're not going to make 
those classes cease to exist. It's not a "yes or no" choice, it's a "what is 
the sanctioned approach?" choice.

The "yes or no" framing leads to attempts to compare performance with or 
without checks. But the "which approach" choice means choosing between 
performance of:
- An identity class
- A class with hand-coded checks in methods
- A class that automatically checks member accesses, like we do with null
- A dynamic requirement that fields/arrays of a certain class type have to be 
initialized before they're read
- Etc.



Re: Revisiting default values

2020-08-02 Thread Peter Levart

Hi,

On 7/28/20 9:06 PM, Brian Goetz wrote:
I think what we need for Bucket 3 (which I think we agree is more 
important than Bucket 2) is to (optionally, only for NGD inline 
classes) restore parity with reference types by ensuring that the 
receiver of a method invocation is never seen to be the default 
value.  (We already do this for reference types; we NPE before the 
dispatch would succeed.)   And the strategies we've been kicking 
around have ranged from "try to prevent the default from showing up in 
the heap" to "detect when the default shows at various times."


If the important point in time is method dispatch, then we can 
probably simplify to:


 - Let some classes mark themselves as NGD (no good default)
 - At the point of invocation of an NGD instance method, check the 
receiver against the default, throw NPE if it is
 - Optionally, try to optimize this check by identifying (manually or 
automatically) a pivot field


Note that even an unoptimized check is probably pretty fast already: 
"are all the bits zero."  But we can probably often optimize down to a 
single-word comparison to zero.


Note too that we can implement this check in either generated bytecode 
or in the VM; the semantics are the same, the latter is more secure. 



I can understand that automatic runtime prevention of invoking instance 
methods with default (all zero) object is important for fail-fast 
behavior. It is almost like invoking methods with identity typed 
parameters where null values are not valid parameters. We use 
Objects.requireNonNull() to check for such parameters at the beginning 
of such methods. So NGD classes could be designed such that they 
encapsulate all fields and explicitly check for absence of all-zero 
"this" value at the beginning of methods.
People want to simplify such tedious repetitive coding so they make 
frameworks that turn @NonNull annotations on method parameters into 
non-null checks at the top of the method. I can imagine a javac plugin 
could insert checks in all (non-private only?) instance methods when an 
inline class is marked with @NGD for example. Or this could be baked 
into Java language. In either case I think it is a matter of the inline 
class bytecode and not the code doing invocation (the call site). So it 
is safe by itself. Or am I missing something?



Regards, Peter




Re: Revisiting default values

2020-08-03 Thread Dan Smith
> On Aug 2, 2020, at 10:08 AM, Peter Levart  wrote:
> 
> In either case I think it is a matter of the inline class bytecode and not 
> the code doing invocation (the call site). So it is safe by itself. Or am I 
> missing something?

You're describing Option F. Yes, we can have javac generate checks in the 
bytecode of inline class method bodies.

Some awkwardness remains whenever default methods or Object methods are 
invoked. It would be difficult and expensive to implement any checks in these 
method bodies; and while bridge methods generated in the inline class's class 
file help, they don't guard against new methods declared after compilation (the 
motivating use case for the default methods feature). So we're left with one of:

- Permit superclass/superinterface code to run, only throwing (or at least only 
guaranteeing a throw) when one of the declared instance methods of the class 
are invoked; or

- Option G: implement the checks in the JVM, where we can see the entire set of 
inherited member methods as the inline class is loaded/linked



Re: Revisiting default values

2020-08-05 Thread Peter Levart



On 8/3/20 10:21 PM, Dan Smith wrote:

On Aug 2, 2020, at 10:08 AM, Peter Levart  wrote:

In either case I think it is a matter of the inline class bytecode and not the 
code doing invocation (the call site). So it is safe by itself. Or am I missing 
something?

You're describing Option F. Yes, we can have javac generate checks in the 
bytecode of inline class method bodies.

Some awkwardness remains whenever default methods or Object methods are 
invoked. It would be difficult and expensive to implement any checks in these 
method bodies; and while bridge methods generated in the inline class's class 
file help, they don't guard against new methods declared after compilation (the 
motivating use case for the default methods feature). So we're left with one of:

- Permit superclass/superinterface code to run, only throwing (or at least only 
guaranteeing a throw) when one of the declared instance methods of the class 
are invoked; or



That would not be so bad, I think. Why? The Object methods that require 
access to instance state (equals, hashCode) would be implemented by 
inline class and would contain the checks. Other Object methods are 
mostly not allowed for inline classes anyway. So this leaves us with 
default methods of interfaces implemented by inline class. These are of 
two kinds: either they are just functions that don't deal with the 
instance state at all and would be better off as static methods anyway, 
or they deal with instance state in which case they must invoke at least 
one of inline class declared methods and the checks will be triggered. 
So I would say that anything important that accesses instance state is 
guarded by instrumentation of inline class methods.


But what about accessing fields directly? Even if fields are 
encapsulated (private), they can be accessed by code of the inline class 
itself or a nestmate when an instance of the inline class is not "this". 
In that case, I think the same strategy as for null checking of identity 
types is in order: an equivalent of Objects.requireNonNull for inline 
types. Forgetting to issue those checks manually can have a surprising 
effect though.





- Option G: implement the checks in the JVM, where we can see the entire set of 
inherited member methods as the inline class is loaded/linked


Yeah, this could also work for field accesses then.


A hybrid of call-site (or field-access-site) checks and checks embedded 
in the instance methods is also possible. Javac could emit a check 
before accessing a field from code where embedded check is not performed 
(where instance is not "this") and before invoking default interface 
method (as determined by static type). This would then cover all places 
and still be secure since for security, the checks embedded in the 
instance methods can not be bypassed.



Peter




Re: Revisiting default values

2020-08-05 Thread Peter Levart



On 8/5/20 10:05 AM, Peter Levart wrote:
Javac could emit a check before accessing a field from code where 
embedded check is not performed (where instance is not "this") and 
before invoking default interface method (as determined by static type)


...hm, javac could not do that, right? It does not know if the instance 
is of inline class at that time and of which inline class. This is only 
possible during runtime...


But at least for field accesses this could work.


Peter



Re: Revisiting default values

2020-08-05 Thread Dan Smith


> On Aug 5, 2020, at 2:11 AM, Peter Levart  wrote:
> 
> 
> On 8/5/20 10:05 AM, Peter Levart wrote:
>> Javac could emit a check before accessing a field from code where embedded 
>> check is not performed (where instance is not "this") and before invoking 
>> default interface method (as determined by static type)
> 
> ...hm, javac could not do that, right? It does not know if the instance is of 
> inline class at that time and of which inline class. This is only possible 
> during runtime...
> 
> But at least for field accesses this could work.

Correct. In theory, for method invocations, javac could do something reflective 
for *every* invocation involving Object or an interface. Something like:

Runnable obj = ...;
if (obj.getClass().isInline() && obj.getClass().defaultValue() == obj)
throw new InvalidDefaultInstanceException();
obj.run();

But of course that's a huge burden on every Java program in the world, without 
regard to whether they expect to encounter Bucket #3 inline classes.

It's a similar story for Option H—compiler checks on array reads from Object[], 
Runnable[], etc.



Re: Revisiting default values

2021-03-15 Thread Brian Goetz

Picking this issue up again. To summarize Dan's buckets:

Bucket 1 -- the zero default is in the domain, and is a sensible default 
value.  Zero for numerics, empty optionals.


Bucket 2 -- there is a sensible default value, but all-zero-bits isn't it.

Bucket 3 -- there simply is no sensible default value.


Ultimately, though, this is not about defaults; it is about 
_uninitialized variables_.  The default only comes into play when the 
user uses an uninitialized variable, which usually means (a) 
uninitialized fields or (b) uninitialized array elements.  It is 
possible that the language could give us seat belts to dramatically 
narrow the chance of uninitialized fields, but uninitialized array 
elements are much harder to stamp out.


It is an attractive distraction to get caught up in designing mechanisms 
for supplying an alternate default ("just let the user declare a no-arg 
constructor"), but this is focusing on the "writing code" part of the 
problem, not the "keeping code safe" part of the problem.


In some sense, it is the existence (and size) of Bucket 1 that causes 
the problem; Bucket 1 is what gives us our sense that it is safe to use 
uninitialized variables.  In the current language, uninitialized 
reference variables are also safe in that if you use them before they 
are initialized, you get an exception before anything bad can happen.  
Uninitialized primitives in today's language are more dangerous, because 
we may interpret the uninitialized value, but this has been a problem 
we've been able to live with because today's primitives are pretty 
limited and zero is usually a good-enough default in most domains.  As 
we extend primitives to look more like objects, with behavior, this gets 
harder.



Both buckets 2 and 3 can be remediated without help from the language or 
VM, perhaps inconveniently, by careful coding on the part of the author 
of the primitive class:


 - don't expose fields to users (a good practice anyway)
 - check for zero on entry to each method

These are options A and E.  The difference between Buckets 2 (A) and 3 
(E) in this model is what do we do when we find a zero; for bucket 2, we 
substitute some pre-baked value and use that, and for bucket 3, we throw 
something (what we throw is a separate discussion.)  The various 
remediation techniques Dan offers represents a menu which allows us to 
trade off reliability/cost/intrusiveness.


I think we should lean on the model currently implemented by reference 
types, where _accessing_ an uninitialized field is OK, but _using_ the 
value in the field is not.  If we have:


    String s;

All of the following are fine:

    String t = s;
    if (s == null) { ... }
    if (s == t) { ... }

The thing that is not fine is s-dot-something.  These are the E/F/G 
options, not the H/I options.


Secondarily, H/I, which attempt to hide the default, create another 
problem down the road: when we get to specialized generics, `T.default` 
would become partial.


Some of the solutions for Bucket 3 generalize well enough to Bucket 2 
that we might consider merging them (though there are still messy 
details).  Option F, for example, injects code at the top of each method 
body:


    int m() {
        if (this == )
    throw new NullPointerException();
    /* body of m */
    }

into the top of each method; a corresponding feature for Bucket 2 might 
inject slightly different code:


    int m() {
        if (this == )
    return .m();
    /* body of m */
    }


Another thing that has evolved since we started this discussion is 
recognizing the difference between .val and .ref projections.  Imagine 
you could declare your membership in bucket 3:


    __bucket_3 primitive class NGD { ... }

If, in addition to some way of generating an NPE on dereference (F, G, 
etc), we mucked with the conversion of NGD.val to NGD.ref (which the 
compiler can inject code on), we could actually put a null on top of the 
stack.  Then, code like:


    if (ngd == null) { ... }

would actually work, because to do the comparison, we'd first promote 
ngd to a reference type (null is already a reference), and we'd compare 
two nulls.




On 7/10/2020 2:23 PM, Dan Smith wrote:

Brian pointed out that my list of candidate inline classes in the Identity Warnings JEP 
(JDK-8249100) includes a number of classes that, despite being "value-based 
classes" and disavowing their identity, might not end up as inline classes. The 
problem? Default values.

This might be a good time to revisit the open design issues surrounding default 
values and see if we can make some progress.

Background/status quo: every inline class has a default instance, which 
provides the initial value of fields and array components that have the inline 
type (e.g., in 'new Point[10]'). It's also the prototype instance used to 
create all other instances (start with 'vdefault', then apply 'withfield' as 
needed). The default value is, by fiat, the class instance 

Re: Revisiting default values

2021-03-17 Thread Brian Goetz
Let me propose another strategy for Bucket 3.  It could be implemented 
at either the VM or language level, but the latter probably needs some 
help from the VM anyway.  The idea is that the default value is 
_indistinguishable from null_.  Strawman:


 - Classes can be marked as default-hostile (e.g., `primitive class X 
implements NoGoodDefault`);
 - Prior to dereferencing a default-hostile class, a check is made 
against the default value, and an NPE is thrown if it is the default value;
 - When widening to a reference type, a check is made if it is the 
default value, and if so, is converted to null;
 - When narrowing from a reference type, a check is made for null, and 
if so, converted to the default value;
 - It is allowable to compare `x == null`, which is intepreted as 
"widen x to X.ref, and compare";
- (optional) the interface NoGoodDefault could have a method that 
optimizes the check, such as by using a pivot field, or the language/VM 
could try to automatically pick a pivot field.


Classes which opt for NoGoodDefault will be slower than those that do 
not due to the check, but they will flatten. Essentially, this lets 
authors choose between "zero means default" and "zero means null", at 
some cost.


A risk here is that ignorant users who don't understand the tradeoffs 
will say "oh, great, there's my nullable primitive types", overuse them, 
and then say "primitive types are slow, java sucks."  The goal here 
would be to provide _safety_ for primitive types for which the default 
is dangerous.



On 3/15/2021 11:52 AM, Brian Goetz wrote:

Picking this issue up again.  To summarize Dan's buckets:

Bucket 1 -- the zero default is in the domain, and is a sensible 
default value.  Zero for numerics, empty optionals.


Bucket 2 -- there is a sensible default value, but all-zero-bits isn't 
it.


Bucket 3 -- there simply is no sensible default value.


Ultimately, though, this is not about defaults; it is about 
_uninitialized variables_.  The default only comes into play when the 
user uses an uninitialized variable, which usually means (a) 
uninitialized fields or (b) uninitialized array elements.  It is 
possible that the language could give us seat belts to dramatically 
narrow the chance of uninitialized fields, but uninitialized array 
elements are much harder to stamp out.


It is an attractive distraction to get caught up in designing 
mechanisms for supplying an alternate default ("just let the user 
declare a no-arg constructor"), but this is focusing on the "writing 
code" part of the problem, not the "keeping code safe" part of the 
problem.


In some sense, it is the existence (and size) of Bucket 1 that causes 
the problem; Bucket 1 is what gives us our sense that it is safe to 
use uninitialized variables.  In the current language, uninitialized 
reference variables are also safe in that if you use them before they 
are initialized, you get an exception before anything bad can happen.  
Uninitialized primitives in today's language are more dangerous, 
because we may interpret the uninitialized value, but this has been a 
problem we've been able to live with because today's primitives are 
pretty limited and zero is usually a good-enough default in most 
domains.  As we extend primitives to look more like objects, with 
behavior, this gets harder.



Both buckets 2 and 3 can be remediated without help from the language 
or VM, perhaps inconveniently, by careful coding on the part of the 
author of the primitive class:


 - don't expose fields to users (a good practice anyway)
 - check for zero on entry to each method

These are options A and E.  The difference between Buckets 2 (A) and 3 
(E) in this model is what do we do when we find a zero; for bucket 2, 
we substitute some pre-baked value and use that, and for bucket 3, we 
throw something (what we throw is a separate discussion.)  The various 
remediation techniques Dan offers represents a menu which allows us to 
trade off reliability/cost/intrusiveness.


I think we should lean on the model currently implemented by reference 
types, where _accessing_ an uninitialized field is OK, but _using_ the 
value in the field is not. If we have:


    String s;

All of the following are fine:

    String t = s;
    if (s == null) { ... }
    if (s == t) { ... }

The thing that is not fine is s-dot-something.  These are the E/F/G 
options, not the H/I options.


Secondarily, H/I, which attempt to hide the default, create another 
problem down the road: when we get to specialized generics, 
`T.default` would become partial.


Some of the solutions for Bucket 3 generalize well enough to Bucket 2 
that we might consider merging them (though there are still messy 
details).  Option F, for example, injects code at the top of each 
method body:


    int m() {
        if (this == )
    throw new NullPointerException();
    /* body of m */
    }

into the top of each method; a corresponding feature for Bucket 2 
might inject slight

Re: Revisiting default values

2021-06-29 Thread Kevin Bourrillion
Sorry for quietness of late.

Some new thoughts.

   - Default behaviors of language features should be based *first* on
   bug-proof-ness; if a user has to opt into safety that means they were not
   safe.
   - `null` and nullable types are a very good thing for safety; NPE
   protects us from more nasty bugs than we can imagine.
   - A world where *all* user-defined primitive classes must be nullable
   (following Brian's outline) is completely *sane*, just not optimized.
   - (We'd like to still be able to fashion a *non-nullable type* when the
   class itself allows nullability, but this is a general need we already have
   for ref types and shouldn't have much bearing here. Still working hard on
   jspecify.org...)
   - It's awkward that `Instant` would have to add a `boolean valid = true`
   field, but it's not inappropriate. It has the misfortune that it both can't
   restrict its range of values *and* has no logical zero/default.
   - A type that does have a restricted range of legal values, but where
   that range includes the `.default` value, might do some very ugly tricks to
   avoid adding that boolean field; not sure what to think about this.
   - Among all the use cases for primitive classes, the ones where the
   default value is non-degenerate and expected are the special cases! We use
   `Complex` as a go-to example, but if most of what we did with complex
   numbers was divide them by each other then even this would be dubious. We'd
   be letting an invalid value masquerade as a valid one when we'd rather it
   just manifest as `null` and be subject to NPEs.
   - If we don't do something like Brian describes here, then I suppose
   second-best is that we make a *lot* of these things ref-default
   (beginning with Instant and not stopping there!) and warn about the dangers
   of `.val`

tl;dr nullable by default!

Would be glad to hear what I'm missing or not understanding right.


On Wed, Mar 17, 2021 at 8:14 AM Brian Goetz  wrote:

> Let me propose another strategy for Bucket 3.  It could be implemented at
> either the VM or language level, but the latter probably needs some help
> from the VM anyway.  The idea is that the default value is
> _indistinguishable from null_.  Strawman:
>
>  - Classes can be marked as default-hostile (e.g., `primitive class X
> implements NoGoodDefault`);
>  - Prior to dereferencing a default-hostile class, a check is made against
> the default value, and an NPE is thrown if it is the default value;
>  - When widening to a reference type, a check is made if it is the default
> value, and if so, is converted to null;
>  - When narrowing from a reference type, a check is made for null, and if
> so, converted to the default value;
>  - It is allowable to compare `x == null`, which is intepreted as "widen x
> to X.ref, and compare";
>  - (optional) the interface NoGoodDefault could have a method that
> optimizes the check, such as by using a pivot field, or the language/VM
> could try to automatically pick a pivot field.
>
> Classes which opt for NoGoodDefault will be slower than those that do not
> due to the check, but they will flatten.  Essentially, this lets authors
> choose between "zero means default" and "zero means null", at some cost.
>
> A risk here is that ignorant users who don't understand the tradeoffs will
> say "oh, great, there's my nullable primitive types", overuse them, and
> then say "primitive types are slow, java sucks."  The goal here would be to
> provide _safety_ for primitive types for which the default is dangerous.
>
>
> On 3/15/2021 11:52 AM, Brian Goetz wrote:
>
> Picking this issue up again.  To summarize Dan's buckets:
>
> Bucket 1 -- the zero default is in the domain, and is a sensible default
> value.  Zero for numerics, empty optionals.
>
> Bucket 2 -- there is a sensible default value, but all-zero-bits isn't
> it.
>
> Bucket 3 -- there simply is no sensible default value.
>
>
> Ultimately, though, this is not about defaults; it is about _uninitialized
> variables_.  The default only comes into play when the user uses an
> uninitialized variable, which usually means (a) uninitialized fields or (b)
> uninitialized array elements.  It is possible that the language could give
> us seat belts to dramatically narrow the chance of uninitialized fields,
> but uninitialized array elements are much harder to stamp out.
>
> It is an attractive distraction to get caught up in designing mechanisms
> for supplying an alternate default ("just let the user declare a no-arg
> constructor"), but this is focusing on the "writing code" part of the
> problem, not the "keeping code safe" part of the problem.
>
> In some sense, it is the existence (and size) of Bucket 1 that causes the
> problem; Bucket 1 is what gives us our sense that it is safe to use
> uninitialized variables.  In the current language, uninitialized reference
> variables are also safe in that if you use them before they are
> initialized, you get an exception before anything

Re: Revisiting default values

2021-06-29 Thread Dan Smith
> On Jun 29, 2021, at 11:54 AM, Kevin Bourrillion  wrote:
> 
> Sorry for quietness of late.

Glad to have you back!

Unfortunately, there's not much new to report in this area, other than the fact 
that we are aware that more design and prototyping work is needed.

Here's an open task to prototype an initial javac-only strategy:
https://bugs.openjdk.java.net/browse/JDK-8252781

> Some new thoughts.
>   • Default behaviors of language features should be based first on 
> bug-proof-ness; if a user has to opt into safety that means they were not 
> safe.
>   • `null` and nullable types are a very good thing for safety; NPE 
> protects us from more nasty bugs than we can imagine.
>   • A world where all user-defined primitive classes must be nullable 
> (following Brian's outline) is completely sane, just not optimized.

These are good principles, and I'm sympathetic to them (with the implication 
that the right "default default" is null/*; using a different default 
should be opt-in).

(*By , I mean the all-zero-bits instance of a no-good-default class, 
without picking a particular semantics for that value.)

But... these principles potentially conflict with engineering constraints. 
E.g., I can imagine a world in which a no-good-default primitive class is no 
better than an identity class in most use cases, and at that point, we're best 
off simply not supporting the no-good-default feature at all. (With the 
implication that many of the primitive-candidate classes we are imagining 
should continue to be identity classes.) I don't like that world, and I don't 
know how realistic it is, but there's pressure coming from that direction.

To move forward on the "what is the best default default?" question, we're 
going to need more engineering on no-good-default classes, and get a better 
sense of their performance characteristics.

>   • (We'd like to still be able to fashion a non-nullable type when the 
> class itself allows nullability, but this is a general need we already have 
> for ref types and shouldn't have much bearing here. Still working hard on 
> jspecify.org...)

I think we're pretty locked in to:
- Some primitive class types like Complex must be non-nullable (for compactness)
- We won't (at least for now) support non-nullable types in full generality

Always possible that we'd want to step back and revisit this design, but it's 
pretty mature.

Speaking of orthogonality, there *is* an open question about how we interpret 
, and this is orthogonal to the question of whether  should 
be the "default default". We've talked about:
- It's interchangeable with null
- It's null-like (i.e., detected on member access), but distinct
- It's a separate concept, and it is an error to ever read it from fields/arrays

All still on the table.

(And within each of these, we still need to further explore the implications of 
JVM vs. language implementation strategies.)

>   • It's awkward that `Instant` would have to add a `boolean valid = 
> true` field, but it's not inappropriate. It has the misfortune that it both 
> can't restrict its range of values and has no logical zero/default.
>   • A type that does have a restricted range of legal values, but where 
> that range includes the `.default` value, might do some very ugly tricks to 
> avoid adding that boolean field; not sure what to think about this.

How we encode  is an interesting question that deserves more 
exploration. There's a potential trade-off here between safety and performance, 
and like you I'm inclined to prioritize safety. Maybe there are reasonable ways 
we can get them both...

>   • Among all the use cases for primitive classes, the ones where the 
> default value is non-degenerate and expected are the special cases! We use 
> `Complex` as a go-to example, but if most of what we did with complex numbers 
> was divide them by each other then even this would be dubious. We'd be 
> letting an invalid value masquerade as a valid one when we'd rather it just 
> manifest as `null` and be subject to NPEs.

Complex and friends are special cases, but they're also the *most important* 
cases. I'd really prefer not to have to pick, but if forced to, it may be more 
important for primitive classes to support optimally the 10% "has a good 
default" cases (roughly, those that are number-like) than the 90% "no good 
default" cases (roughly, those that wrap references).

>   • If we don't do something like Brian describes here, then I suppose 
> second-best is that we make a lot of these things ref-default (beginning with 
> Instant and not stopping there!) and warn about the dangers of `.val`

I'm not a big fan of this approach. It gives you the illusion of safety 
(well-written code only sees valid values) but blows up in unpredictable ways 
when a bug or a hostile actor leaks  into your program. If we don't 
offer stronger guarantees, and your code isn't willing to check for , 
you really shouldn't be programming with 

Re: Revisiting default values

2021-06-29 Thread Kevin Bourrillion
Thanks for giving this your attention!

On Tue, Jun 29, 2021 at 12:56 PM Dan Smith  wrote:

E.g., I can imagine a world in which a no-good-default primitive class is
> no better than an identity class in most use cases, and at that point,
> we're best off simply not supporting the no-good-default feature at all.


Ah. I don't have that imagination at the moment. Maybe my picture of this
whole feature went straight from overly choleric to overly sanguine!



> (With the implication that many of the primitive-candidate classes we are
> imagining should continue to be identity classes.)


I'm not sure that follows, because to the library owner, this should be
about letting their users have the *choice* between ref/val, and also about
not senselessly exposing identity when identity is senseless. I guess the
main reason I can see resisting the change for some data type (that's
appropriate for primitive-ness) is if "most users will want/have to use the
.ref projection, *and* they'll perform way worse than if ref was their only
choice." Is that a concern? (And of course, in this thread I'm talking
about *removing* one of the reasons for wide use of .ref, nullability.)



> I think we're pretty locked in to:
> - Some primitive class types like Complex must be non-nullable (for
> compactness)
> - We won't (at least for now) support non-nullable types in full generality
>

Good good. I guess I'm submitting #3 for consideration, "We will
deliberately not worry about problems caused by nullability of primitive
types if those problems are just the same ones we already have with
reference types."


Speaking of orthogonality, there *is* an open question about how we
> interpret , and this is orthogonal to the question of whether
>  should be the "default default". We've talked about:
> - It's interchangeable with null
> - It's null-like (i.e., detected on member access), but distinct
> - It's a separate concept, and it is an error to ever read it from
> fields/arrays
>
> All still on the table.
>

Oh. Yeah, if you look at all the work we've all poured into how we manage
null and its attendant risks, and ongoing work (perhaps charitably assume
JSpecify will be successful! :-)), then it's knd of a disaster if
there's suddenly a second kind of nullness. #nonewnulls



> Complex and friends are special cases, but they're also the *most
> important* cases. I'd really prefer not to have to pick, but if forced to,
> it may be more important for primitive classes to support optimally the 10%
> "has a good default" cases (roughly, those that are number-like) than the
> 90% "no good default" cases (roughly, those that wrap references).
>

To clarify, I don't think I meant "special case" as "deprioritize", only as
"that's the case I think I'd have users opt into intentionally".



> >   • If we don't do something like Brian describes here, then I
> suppose second-best is that we make a lot of these things ref-default
> (beginning with Instant and not stopping there!) and warn about the dangers
> of `.val`
>
> I'm not a big fan of this approach. It gives you the illusion of safety
> (well-written code only sees valid values) but blows up in unpredictable
> ways when a bug or a hostile actor leaks  into your program. If we
> don't offer stronger guarantees, and your code isn't willing to check for
> , you really shouldn't be programming with a primitive class.
>

Indeed, calling it my second-best was not meant to imply I don't also hate
it. :-)

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kev...@google.com


Re: Revisiting default values

2021-06-30 Thread John Rose

On Jun 29, 2021, at 2:36 PM, Kevin Bourrillion 
mailto:kev...@google.com>> wrote:

Speaking of orthogonality, there *is* an open question about how we interpret 
, and this is orthogonal to the question of whether  should 
be the "default default". We've talked about:
- It's interchangeable with null
- It's null-like (i.e., detected on member access), but distinct
- It's a separate concept, and it is an error to ever read it from fields/arrays

All still on the table.

Oh. Yeah, if you look at all the work we've all poured into how we manage null 
and its attendant risks, and ongoing work (perhaps charitably assume JSpecify 
will be successful! :-)), then it's knd of a disaster if there's suddenly a 
second kind of nullness. #nonewnulls



BTW, the combination of #nonewnulls (a principle
I whole-heartedly favor) and “it is an error to ever
read it” pencils out some capability to define
containers of nullable types but which reject nulls
in some way.  (Perhaps a subtle way:  Perhaps the
container starts out null but cannot be read
until a different initial value is written.)

Such containers would resemble, in certain
ways, containers for lazily-computed values.
(I.e., they have a dynamically-distinguished
undefined state, and they cannot return to
that state having left it.)  OTOH, a container
for a lazily-computed *nullable* value would,
in fact, require a second sentinel, not null,
to denote the unbound state; making that
sentinel escape would create a #newnull,
which would be bad.  Not sure how to square
this circle yet.

Another random side-point, this time about
`withfield`:  It is IMO impractical to perform
default exclusion (null detection) for field
assignments in primitive class constructors,
because the default-ness of `this` is a complicated
dynamic property of all the fields together.
As a constructor executes, it may temporarily
revert `this` to the default value if it zeroes
out a field.  So the `withfield` instruction
should *not* perform default exclusion
on `this`.  (Of course an excluded default
would get checked for on exit from the
constructor.)

A similar point goes for `getfield`, perhaps,
though less strongly, because the Java
language would not use it.  But the JVMS
should probably attach default exclusion
either to both `withfield` and `getfield`
or neither.  This suggests that only method
invocations would perform default exclusion.

Which reminds me:  I think we should
allow calls to methods on `this` inside a
constructor.  (Do we?)  A clean way to
statically exclude incomplete values of `this`
would be to outlaw such self-calls until all
final fields are definitely assigned.  The
current language (for identity classes)
computes this point (of complete field
assignment) in order to enforce the rule
that the constructor cannot return until
all final fields have been definitely assigned.

For identity classes it would be nice to
*warn* on self-calls (or perhaps other
uses of `this`) before all finals are DA.
For primitives classes we can outlaw
such things, of course.

Basically, for both primitive and
identity class constructors, it is a likely
bug if you use `this` (for something
other than initializing `this`) before
all final fields are DA.  And yet some
constructors perform additional
operations (such as sanity checks)
in an object constructor, when the
object is in DA and “almost finished”
state.  It would be smart, I think, to
make the rules in this area, as coherent
as possible, for both kinds of classes.

IIRC the status quo is that no uses of
`this` are legitimate in a constructor
body except field writes (of DU fields)
and field reads (of DA fields).  I think
that is too strict, compared to the
laxness of the rules for identity classes,
and given the usefulness of error
checking methods being called from
constructors.



Re: Revisiting default values

2021-07-01 Thread Brian Goetz





Which reminds me:  I think we should
allow calls to methods on `this` inside a
constructor.  (Do we?)  A clean way to
statically exclude incomplete values of `this`
would be to outlaw such self-calls until all
final fields are definitely assigned.  The
current language (for identity classes)
computes this point (of complete field
assignment) in order to enforce the rule
that the constructor cannot return until
all final fields have been definitely assigned.


FYI: A recent paper on the self-use-from-constructor problem: 
https://dl.acm.org/doi/10.1145/3428243





Re: Revisiting default values

2021-07-01 Thread John Rose


> On Jul 1, 2021, at 5:48 AM, Brian Goetz  wrote:
> 
> 
>> 
>> Which reminds me:  I think we should
>> allow calls to methods on `this` inside a
>> constructor.  (Do we?)  A clean way to
>> statically exclude incomplete values of `this`
>> would be to outlaw such self-calls until all
>> final fields are definitely assigned.  The
>> current language (for identity classes)
>> computes this point (of complete field
>> assignment) in order to enforce the rule
>> that the constructor cannot return until
>> all final fields have been definitely assigned.
> 
> FYI: A recent paper on the self-use-from-constructor problem: 
> https://dl.acm.org/doi/10.1145/3428243
> 
> 


Nice; it supports virtual calls in a constructor.
To me that seems a good stretch goal.
A simpler rule would define an “all initialized”
point in the constructor (no DU states, basically)
and open the floodgates there.  A more complicated
set of rules could allow earlier access to partially
DU objects, as a compatible extension.  In terms
of the paper, the initial conservative approach
does not allow (or perhaps warns on) any typestate
that has a remaining DU in it, while an extended
approach would classify accesses according
to which DU’s they might be compatible with.

An example of the difference would be:

primitive class Complex {
  float re, im, abs, arg;
  Complex(float re, float im) {
 this.re = re;
 this.im = im;
 if (CONSERVATIVE_AND_SIMPLE) {
   // we can easily do this today
   this.abs = Complex.computeAbs(re, im);
   this.arg = Complex.computeArg(re, im);
 } else {
   // later, enhanced analysis can allow this.m()
   this.abs = this.computeAbs();
   this.arg = this.computeArg();
 }
  }
}

Other observations:

The paper seems to formalize and extend the
DA/DU rules of Java 1.1 (which I am fond of),
under the term “local reasoning about initialization”.

The distinction between objects that are “hot” (old)
and “warm” (under construction) objects seems to
align with some of our discussions about confinement
of “larval” objects before they promote to “adult”.



Re: [External] : Re: Revisiting default values

2021-06-30 Thread Brian Goetz
Of your points, I think this is the controversial one, and it goes 
straight to "what is the point of primitive classes."


You kind of dismiss optimization, but of course a big part of the point 
is classes that are more optimizable; if we didn't care about 
optimization, we wouldn't bother with primitive classes, we'd just say 
"write classes."


The unfortunate bit is that the reason we're stuck with zeroes as the 
default value comes from the VM's desire to provide both safety _and_ 
performance.  Painting with a roller is cheaper than with a brush, but, 
more importantly, all-zeroes is the only value the JVM can really give 
while promising that some other OOTA value will never be observed, 
regardless of races and other timing hazards.


Now, let's talk more about null.  Null is a *reference* that refers to 
no object.  For a flattened/direct object (P.val), null cannot be in the 
value set (it's the wrong "kind"), though we can arrange for a primitive 
to behave like null in various ways.  It's not clear whether this helps 
or hurts the mental model, since it is distorting what null is.


Really, what this discussion is about is about whether _uninitialized 
primitive objects_ can be used or not; since T.default is a hidden 
constructor, we want to know whether that value is safe to use or 
whether it is merely a tombstone.


We can arrange that x.m NPEs when x == X.default, but that doesn't 
really mean that x is null.  It means that you've dereferenced something 
that should have been initialized before dereferencing it.


The alternative approaches are worse, since they involve creating a 
second null (see John's "No New Nulls" manifesto), which I think we can 
agree no one will like.


What worries me about this approach is that it muddies what null means, 
since a P.val is not a reference to anything.


I think we are better off treating this as a discussion about 
initialization safety, rather than nullity, until we have a clear story 
of how we want things to behave.






On 6/29/2021 1:54 PM, Kevin Bourrillion wrote:
Among all the use cases for primitive classes, the ones where the 
default value is non-degenerate and expected are the special cases! We 
use `Complex` as a go-to example, but if most of what we did with 
complex numbers was divide them by each other then even this would be 
dubious. We'd be letting an invalid value masquerade as a valid one 
when we'd rather it just manifest as `null` and be subject to NPEs.




Re: [External] : Re: Revisiting default values

2021-06-30 Thread John Rose

> On Jun 30, 2021, at 8:39 AM, Brian Goetz  wrote:
> 
> Now, let's talk more about null.  Null is a *reference* that refers to no 
> object.  For a flattened/direct object (P.val), null cannot be in the value 
> set (it's the wrong "kind"), though we can arrange for a primitive to behave 
> like null in various ways.  It's not clear whether this helps or hurts the 
> mental model, since it is distorting what null is.

This is a good point, if we can hold onto it.

Null is a magic one-off boojum that lives in
the space of reference types but makes
field references and method calls “softly
and suddenly vanish away”.

Having P.val.default.m() throw an NPE (under
default exclusion rules TBD) makes the null
 boojum arise from a non-reference value,
but only just long enough to make the method
call go away.  (Boo—)

Dan’s proposals for default exclusion is
loads from uninitialized variables (such as
fresh array elements) amount to another
boojum-like behavior, of making loads
go away (unless the variable has been
stored into previously).  Again, it’s not
directly associated with a reference,
but it is null-like, and perhaps NPE
is the right way to signal the fault.

Of course, our familiar null does not show
complete boojum behavior, because you can
read, write, and print null without yourself
vanishing away.  Likewise, even if we do
some sort of default exclusion, perhaps we
will allow defaults to flow in the same
(limited) paths that nulls can flow.  And
in that case, the #nonewnulls crowd would
expect that only the one value null would
appear, whenever such a value were
converted to a reference.

Maybe, in some of these schemes, null
is not a primitive, but boojums and boxes
are the primitives, and null is a (safely)
boxed boojum?

— John

Re: [External] : Re: Revisiting default values

2021-06-30 Thread John Rose
P.S. Wikipedia gives for boojum:  “A fictional
animal species in Lewis Carroll's nonsense
poem The Hunting of the Snark; a particularly
dangerous kind of snark.”

“But if ever I meet with a Boojum, that day,
   In a moment (of this I am sure),
I shall softly and suddenly vanish away—
   And the notion I cannot endure!”

On Jun 30, 2021, at 5:16 PM, John Rose 
mailto:john.r.r...@oracle.com>> wrote:

Maybe, in some of these schemes, null
is not a primitive, but boojums and boxes
are the primitives, and null is a (safely)
boxed boojum?




Re: [External] : Re: Revisiting default values

2021-06-30 Thread Kevin Bourrillion
On Wed, Jun 30, 2021 at 8:40 AM Brian Goetz  wrote:

Of your points, I think this is the controversial one, and it goes straight
> to "what is the point of primitive classes."
>
> You kind of dismiss optimization, but of course a big part of the point is
> classes that are more optimizable; if we didn't care about optimization, we
> wouldn't bother with primitive classes, we'd just say "write classes."
>

We might be miscommunicating?

Just gonna be candid. This qualitative argument is both of no use,
*and* unnecessary,
because the quantitative argument is the whole point. I mean: if
feature-variation-A actually wipes out 40% of the performance benefit of
the whole endeavor, you'd hardly need to convince anyone it's a
non-starter, not even *me*. Just like if it wiped out 1% of gains (while
preventing many likely bugs) then *you* would hardly need convincing. So, I
can't figure out what your purpose in saying this is.


The unfortunate bit is that the reason we're stuck with zeroes as the
> default value comes from the VM's desire to provide both safety _and_
> performance.  Painting with a roller is cheaper than with a brush, but,
> more importantly, all-zeroes is the only value the JVM can really give
> while promising that some other OOTA value will never be observed,
> regardless of races and other timing hazards.
>

(I was treating customized-default-bits as being off the table, so I'm not
sure what this is about?)


Now, let's talk more about null.  Null is a *reference* that refers to no
> object.
>

But also: "a primitive is a predefined irreducible type!"

I sincerely claim that your statement is in the same boat. You're speaking
of what has *happened* to be the case in Java, because reasons.

However, I think the *concept* of null is more basic; it is just "there is
no instance here". Try to do instance stuff, blow up always, that's null.

If you say it's *fundamental* to the notion of a primitive/inline type that
there is *always* a value there okay, but then what you're talking
about is *bits*. Yep there are always bits there. But is that what matters
to software? What software wants is programs that are first correct and
then (as hot on the heels as you like) performant.

I've said that primitive types for which all-zeroes isn't *valid* would
prefer to surface that value as *null *instead. So I guess my big bold play
here is to claim that the primitive type "reference" is just the first
*example* of this category, that's all.

(I feel very untroubled by seeing "reference" as a primitive type, which is
special only in that Java traverses it for you -- lets you see "dereference
then member access" as if it is just "member access" -- which it does for
obvious and obviously-special reasons.)


... distorting what null is ...
> ... muddies what null means ...
>

One of the things I *very much* like about where this project has gotten to
is that the "ret-cons" it requires *aren't even really ret-cons*.

   - "Primitive" has always meant "inline". It never really meant
   "predefined"; Java just didn't happen to let you define them.
   - "Instance" has never really meant "thing on the heap with identity".
   It always meant "one of a class". Java just didn't happen to have other
   kinds.
   - "Null" as a noun was never well-defined, only "null literal" and "null
   reference". "null reference" has always meant "there is no instance here",
   and if we *do* define "null value", it will mean the same thing!


I think we are better off treating this as a discussion about
> initialization safety, rather than nullity, until we have a clear story of
> how we want things to behave.
>

Sure, I'd be very interested in that discussion too.


On 6/29/2021 1:54 PM, Kevin Bourrillion wrote:
>
> Among all the use cases for primitive classes, the ones where the default
> value is non-degenerate and expected are the special cases! We use
> `Complex` as a go-to example, but if most of what we did with complex
> numbers was divide them by each other then even this would be dubious. We'd
> be letting an invalid value masquerade as a valid one when we'd rather it
> just manifest as `null` and be subject to NPEs
>
> Clarifications:

* "special cases" != "rare" or "exotic"
* "dubious" != "bad" or "wrong"

Also, reminder that X% of my opinions are wrong


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kev...@google.com


Re: [External] : Re: Revisiting default values

2021-07-01 Thread Brian Goetz




I sincerely claim that your statement is in the same boat. You're 
speaking of what has /happened/ to be the case in Java, because reasons.


However, I think the /concept/ of null is more basic; it is just 
"there is no instance here". Try to do instance stuff, blow up always, 
that's null.


If you say it's /fundamental/ to the notion of a primitive/inline type 
that there is /always/ a value there okay, but then what you're 
talking about is /bits/. Yep there are always bits there. But is that 
what matters to software? What software wants is programs that are 
first correct and then (as hot on the heels as you like) performant.


I'm saying something slightly different (but I agree it's as much "what 
happened" as "what is").  Our formulation of ref/val is based on saying 
"well, the instance *is*, but there are two ways to store it, and 
nullity is part of the 'how you store it' rather than the 'what it 
is'."  I think this is a more helpful way to look at it, but of course 
it's not the only way.  But I want to make sure that we don't select a 
set of mental models that are locally sane but conflict with each other.




I think we are better off treating this as a discussion about
initialization safety, rather than nullity, until we have a clear
story of how we want things to behave.


Sure, I'd be very interested in that discussion too.


By that I mean: framing this not as "this field is null", but instead, 
"this field is uninitialized, so it is an error to dereference it."  One 
could argue (well, it would be a stretch) that the following code is not 
broken:


    String nullIMeanIt = null;
    nullIMeanIt.length();  // Gimme an NPE, dammit!

It is obviously *silly*, but by "not broken", what I mean is that 
getting an NPE when you dereference a null pointer is *what null does*.  
The bug in most NPEs is a null where we didn't expect it, but usually 
one that was put there.  When we get a NPE from x.foo(), it is often not 
the case that we failed to initialize x; we initialized it to something 
that unexpectedly returned null *as a value*, such as `x = getBar()`.  
All the @NonNull stuff is an attempt to get null to go back in the cage, 
after we let it escape. But I think with no-good-default primitives, it 
is about not letting the thing escape in the first place.  The 
difference in practice may be minor, but the difference in the story we 
are able to tell as a result might be bigger.