On tearing

2022-04-27 Thread Brian Goetz
Several people have asked why I am so paranoid about tearing.  This mail is 
about tearing; there’ll be another about user model stacking and performance 
models.  (Please, let’s try to resist the temptation to jump to “the answer”.)

Many people are tempted to say “let it tear.”  The argument for “let it tear” 
is a natural-sounding one; after all, tearing only happens when someone else 
has made a mistake (data race).  It is super-tempting to say “Well, they made a 
mistake, they get the consequences”.

While there are conditions under which this would be a reasonable argument, I 
don’t think those conditions quite hold here, because from both the inside and 
the outside, B3 classes “code like a class.”  Authors will feel free to use 
constructors to enforce invariants, and if the use site just looks like 
“Point”, clients will not be wanting to keep track of “is this one of those 
classes with, or without, integrity?”  Add to this, tearing is already weird, 
and while it is currently allowed for longs and doubles, 99.% of Java 
developers have never actually seen it or had to think about it very carefully, 
because implementations have had atomic loads and stores for decades.

As our poster child, let’s take integer range:

 __B3 record IntRange(long low, int high) {

public IntRange {
if (low > high) throw;
}
}

Here, the author has introduced an invariant which is enforced by the 
constructor.  Clients would be surprised to find an IntRange in the wild that 
disobeys the invariant.  Ranges have a reasonable zero value.  This a an 
obvious candidate for B3.

But, I can make this tear.  Imagine a mutable field:

 /* mutable */ IntRange r;

and two threads racing to write to r.  One writes IntRange(5, 10); the other 
writes IntRange(2,4).  If the writes are broken up into two writes, then a 
client could read IntRange(5, 4).  Worse, unlike more traditional races which 
might be eventually consistent, this torn value will be observable forever.

Why does this seem worse than a routine long tearing (which no one ever sees 
and most users have never heard of)?  Because by reading the code, it surely 
seems like the code is telling me that IntRange(5, 4) is impossible, and having 
one show up would be astonishing.  Worse, a malicious user can create such a 
bad value (statistically) at will, and then inject that bad value into code 
that depends on the invariants holding.

Not all values are at risk of such astonishment, though.  Consider a class like:

__B3 record LongHolder(long x) { }

Given that a LongHolder can contain any long value, users of LongHolder are not 
expecting that the range is carefully controlled.  There are no invariants for 
which breaking them would be astonishing; LongHolder(0x1234567887654321) is 
just as valid a value as LongHolder(3).

There are two factors here: invariants and transparency.  The above examples 
paint the ranges of invariants (from none at all, to invariants that constrain 
multiple fields).  But there’s also transparency.  The second example was 
unsurprising because the API allowed us to pass any long in, so we were not 
surprised to see big values coming out.  But if the relationship between the 
representation and the construction API is more complicated, one could imagine 
thinking the constructor has deterred all the surprising values, and then still 
see a surprising value.  That longs might tear is less surprising because any 
combination of bits is a valid long, and there’s no way to exclude certain 
values when “constructing” a long.

Separately, there are different considerations at the declaration and use site. 
 A user can always avoid tearing by avoiding data races, such as marking the 
field volatile (that’s the usual cure for longs and doubles.)  But what we’re 
missing is something at the declaration site, where the author can say “I have 
integrity concerns” and constrain layout/access accordingly.  We experimented 
with something earlier (“extends NonTearable”) in this area.


Coming back to “why do we care so much”.  PLT_Hulk summarized JCiP in one 
sentence:

https://twitter.com/PLT_Hulk/status/509302821091831809

If Java developers have learned one thing about concurrency, it is: “immutable 
objects are always thread-safe.”  While we can equivocate about whether B3.val 
are objects or not, this distinction is more subtle than we can expect people 
to internalize.  (If people internalized “Write immutable classes, they will 
always be thread-safe”, that would be pretty much the same thing.)  We cannot 
deprive them of the most powerful and useful guideline for writing safe code.

(To make another analogy: serialization lets objects appear to not obey 
invariants established in the constructor.  We generally don’t like this; we 
should not want to encourage more of this.)

There are options here, but none are a slam dunk:

 - Force all B3 values to be atomic, which will have a performance cost;

Re: On tearing

2022-04-27 Thread Remi Forax
> From: "Brian Goetz" 
> To: "valhalla-spec-experts" 
> Sent: Wednesday, April 27, 2022 3:59:31 PM
> Subject: On tearing

> Several people have asked why I am so paranoid about tearing. This mail is 
> about
> tearing; there’ll be another about user model stacking and performance models.
> (Please, let’s try to resist the temptation to jump to “the answer”.)

> Many people are tempted to say “let it tear.” The argument for “let it tear” 
> is
> a natural-sounding one; after all, tearing only happens when someone else has
> made a mistake (data race). It is super-tempting to say “Well, they made a
> mistake, they get the consequences”.

> While there are conditions under which this would be a reasonable argument, I
> don’t think those conditions quite hold here, because from both the inside and
> the outside, B3 classes “code like a class.” Authors will feel free to use
> constructors to enforce invariants, and if the use site just looks like
> “Point”, clients will not be wanting to keep track of “is this one of those
> classes with, or without, integrity?” Add to this, tearing is already weird,
> and while it is currently allowed for longs and doubles, 99.% of Java
> developers have never actually seen it or had to think about it very 
> carefully,
> because implementations have had atomic loads and stores for decades.

> As our poster child, let’s take integer range:

> __B3 record IntRange(long low, int high) {

> public IntRange {
> if (low > high) throw;
> }
> }

> Here, the author has introduced an invariant which is enforced by the
> constructor. Clients would be surprised to find an IntRange in the wild that
> disobeys the invariant. Ranges have a reasonable zero value. This a an obvious
> candidate for B3.

> But, I can make this tear. Imagine a mutable field:

> /* mutable */ IntRange r;

> and two threads racing to write to r. One writes IntRange(5, 10); the other
> writes IntRange(2,4). If the writes are broken up into two writes, then a
> client could read IntRange(5, 4). Worse, unlike more traditional races which
> might be eventually consistent, this torn value will be observable forever.

> Why does this seem worse than a routine long tearing (which no one ever sees 
> and
> most users have never heard of)? Because by reading the code, it surely seems
> like the code is telling me that IntRange(5, 4) is impossible, and having one
> show up would be astonishing. Worse, a malicious user can create such a bad
> value (statistically) at will, and then inject that bad value into code that
> depends on the invariants holding.

> Not all values are at risk of such astonishment, though. Consider a class 
> like:

> __B3 record LongHolder(long x) { }

> Given that a LongHolder can contain any long value, users of LongHolder are 
> not
> expecting that the range is carefully controlled. There are no invariants for
> which breaking them would be astonishing; LongHolder(0x1234567887654321) is
> just as valid a value as LongHolder(3).

> There are two factors here: invariants and transparency. The above examples
> paint the ranges of invariants (from none at all, to invariants that constrain
> multiple fields). But there’s also transparency. The second example was
> unsurprising because the API allowed us to pass any long in, so we were not
> surprised to see big values coming out. But if the relationship between the
> representation and the construction API is more complicated, one could imagine
> thinking the constructor has deterred all the surprising values, and then 
> still
> see a surprising value. That longs might tear is less surprising because any
> combination of bits is a valid long, and there’s no way to exclude certain
> values when “constructing” a long.

> Separately, there are different considerations at the declaration and use 
> site.
> A user can always avoid tearing by avoiding data races, such as marking the
> field volatile (that’s the usual cure for longs and doubles.) But what we’re
> missing is something at the declaration site, where the author can say “I have
> integrity concerns” and constrain layout/access accordingly. We experimented
> with something earlier (“extends NonTearable”) in this area.

> Coming back to “why do we care so much”. PLT_Hulk summarized JCiP in one
> sentence:

> [ https://twitter.com/PLT_Hulk/status/509302821091831809 |
> https://twitter.com/PLT_Hulk/status/509302821091831809 ]

> If Java developers have learned one thing about concurrency, it is: “immutable
> objects are always thread-safe.” While we can equivocate about whether B3.val
> are objects or not, this distinction is more subtle than we can expect people
> to internalize. (If people internalized “Write immutable classes, they 

Re: On tearing

2022-04-27 Thread Dan Heidinga
This is circling around the same root issues as the "Foo / Foo.ref
backward default" thread - which is really when should a developer
pick a B3 over a B2.

Kevin's thought experiment in that thread seems to be approaching this
same idea from a different angle:
> (Thought experiment: if we had an annotation meaning "using the .val type is 
> not a great idea for this class and you should get a compile-time warning if 
> you do"  would we really and I mean *really* even need bucket 2 at all?)

In that thread I suggested some rough rules that said B2s were
preferred in API signatures and B3.val are really about storage.
Kevin outlined a different set of rules on when to prefer B3.val for
APIs.

We originally split B2 out from B3 to support no-good-default values
(aka allow null), support atomicity and avoid tearing. Anything
missing in that list?

WIth B3, we relax some of the constraints required to guarantee the B2
invariants and this allows - but doesn't require! - the JVM to further
optimize the memory density.  A conforming JVM could implement all
B3's using an indirection and have them behave identically to B2s -
and JVMs will likely do so for large B3s or volatile fields.  B3s are
more akin to hint than a promise.  Increasing the memory density
exposes tearing. The two are coupled from an implementation
perspective.

Many of the properties we want for B2 classes are possible because we
adopted references (L carriers).  If we shift towards guaranteed
atomicity for (some) B3.vals, we're going to need to re-examine the VM
model and look at how we represent these additional constraints so the
VM can enforce them.

The VM can provide some tearing-related guarantees for Qs without
indirection but they are hardware dependent - 64bit for sure on all
64bit hardware, 128bit on some newer intel hardware, possibly
different constraints on still other platforms - but maybe that's OK?
Declaring a type must not tear makes it harder for the VM to provide
better density.

John has repeatedly said "Q means go and look".  And the VM already
has to do that before flattening Qs to determine if flattening is
reasonable given the VM's heuristics (ie: size).  Letting users say
they want tear-free B3.vals fits within the existing VM model for L vs
Q and while it may limit the benefits of a particular Q type, seems
like a reasonable thing for users to do.

So aiming for "More declaration-site control over atomicity, so
classes with invariants can ensure their invariants are defended." is
reasonable, fits the existing implementation constraints, but will
cost those users potential density benefits.

The biggest concern I have with this approach is that instead of
having 3 buckets, we're now exposing more of a buffet of options to
users.  Circling back to where I started this email - good defaults
are critical and so is good guidance on when to pick each of the
options or performance cargo cults will undercut the work to split out
the different cases.

--Dan






On Wed, Apr 27, 2022 at 9:59 AM Brian Goetz  wrote:
>
> Several people have asked why I am so paranoid about tearing.  This mail is 
> about tearing; there’ll be another about user model stacking and performance 
> models.  (Please, let’s try to resist the temptation to jump to “the answer”.)
>
> Many people are tempted to say “let it tear.”  The argument for “let it tear” 
> is a natural-sounding one; after all, tearing only happens when someone else 
> has made a mistake (data race).  It is super-tempting to say “Well, they made 
> a mistake, they get the consequences”.
>
> While there are conditions under which this would be a reasonable argument, I 
> don’t think those conditions quite hold here, because from both the inside 
> and the outside, B3 classes “code like a class.”  Authors will feel free to 
> use constructors to enforce invariants, and if the use site just looks like 
> “Point”, clients will not be wanting to keep track of “is this one of those 
> classes with, or without, integrity?”  Add to this, tearing is already weird, 
> and while it is currently allowed for longs and doubles, 99.% of Java 
> developers have never actually seen it or had to think about it very 
> carefully, because implementations have had atomic loads and stores for 
> decades.
>
> As our poster child, let’s take integer range:
>
>  __B3 record IntRange(long low, int high) {
>
> public IntRange {
> if (low > high) throw;
> }
> }
>
> Here, the author has introduced an invariant which is enforced by the 
> constructor.  Clients would be surprised to find an IntRange in the wild that 
> disobeys the invariant.  Ranges have a reasonable zero value.  This a an 
> obvious candidate for B3.
>
> But, I can make this tear.  Imagine a mutable field:
>
>  /* mutable */ IntRange r;
>
> and two threads racing to write to r.  One writes IntRange(5, 10); the other 
> writes IntRange(2,4).  If the writes are broken up into two writes, then a 
> cli

Re: [External] : Re: On tearing

2022-04-27 Thread Brian Goetz

Writing immutable objects in Java is hard, there is already a check list:
  - be sure that your class in not only unmodifiable but really immutable, 
storing a mutable class in a field is an issue
  - do you have declared all fields final, otherwise you have a publication 
issue
  - your constructors do not leak "this", right !

so adding a forth item
  - the class is not a primitive class
does not seem to be a big leap too me.

This whole area seems extremely prone to wishful thinking; we hate the idea of 
making something slower than it could be, that we convince ourselves that “the 
user can reason about this.”  Whether or not it is “too big a leap”, I think it 
is a bigger leap than you are thinking.

For me, we should make the model clear, the compiler should insert a non user 
overridable default constructor but not more because using a primitive class is 
already an arcane construct.

This might help a little bit, but it is addressing the smaller part of the 
problem (zeroes); we need to address the bigger problem (tearing).

I don’t think we have to go so far as to outlaw tearing, but there have to be 
enough cues, at the use and declaration site, that something interesting is 
happening here.

There is no point to nanny people here given that only experts will want to 
play with it.

This is *definitely* wishful thinking.  People will hear that this is a tool 
for performance; 99% of Java developers will convince themselves they are 
experts because, performance!  Developers pathologically over-rotate towards 
whatever the Stack Overflow crowd says is faster.  (And so will Copilot.)  So, 
definitely no.  This argument is pure wishful thinking.   (I will admit to 
being occasionally tempted by this argument too, but then I snap out of it.)

But we (the EG) can also fail, and make a primitive class too easy to use, what 
scare me is people using primitive class just because it's not nullable.

Yes, this is one of the many pitfalls we have to avoid!

This game is hard.




Re: [External] : Re: On tearing

2022-05-16 Thread John Rose


On 27 Apr 2022, at 9:50, Brian Goetz wrote:

…This whole area seems extremely prone to wishful thinking; we hate 
the idea of making something slower than it could be, that we convince 
ourselves that “the user can reason about this.”  Whether or not 
it is “too big a leap”, I think it is a bigger leap than you are 
thinking.


For me, we should make the model clear, the compiler should insert a 
non user overridable default constructor but not more because using a 
primitive class is already an arcane construct.


This might help a little bit, but it is addressing the smaller part of 
the problem (zeroes); we need to address the bigger problem (tearing).


I think I mostly agree with Remi on this point.

A tearable primitive class (call it T-B3 as opposed A-B3 which is 
atomic) can, as you describe, have its invariants broken by races that 
have the effect of writing arbitrary (or almost arbitrary) values into 
fields at any time.


A regular mutable B1 class has a similar problem, except it can be 
defended by a constructor and/or mutator methods that check per-field 
values being stored.  Let’s look at the simplest case (which is rare 
in practice, since it is scary):  Suppose a class has public fields 
which are mutable.  Call such a class a OM-B1 class meaning “open 
mutable B1”.


I think that we can (and probably should) address this educational issue 
by making T-B3 classes look (somehow) like OM-B1 classes.  Then every 
bit of training which leads users to be watchful in their use of OM-B1 
will apply to T-B3 classes.


How to make T-B3 look like OM-B1?  Well, Remi’s idea of a mandated 
open constructor gets most of the way there.  Mandating that the B3 
fields are public is also helpful.  (Records kinda-sorta do that, but 
through component reader methods.)  I truly think those two steps are 
enough, to make it clear to an author of a T-B3 that, if a T-B3 
container is accessible to untrusted parties, then it is free to take on 
any combination of field values at any time.  (And I’m using the word 
“free” here in the rigorous math sense, as in a free product type.)


A further step to nail down the message that the components are 
independently variable would be to provide a reconstructor syntax of 
some sort that amounted to an open invitation to (a) take an instance of 
the T-B3, (b) modify or replace any or all of its field values, and then 
(c ) put it back in the container it came from.  By “open” I mean 
“public to all comers”, which means that every baseline Java 
programmer, who knows about public mutable fields (we can’t cure world 
hunger or negligent Java scribblers), will know that, using that syntax, 
anybody can write anything into any T-B3 value stored in an unprotected 
container.  Just like a OM-B1 object.  Nothing new to see, and all the 
old warnings apply!


We would have to be careful about our messaging about immutability here, 
to prevent folks from mistakenly confusing a T-B3 with an immutable B1 
(I-B1) or B2 (all of which are truly immutable).


One way to do this, that would be blindingly obvious (and IMO too 
blinding), would to (a) allow a `non-final` modifier on fields, 
canceling any implicit immutability property, and (b) *require* 
`non-final` modifiers on all fields in a T-B3 class.  I put this forward 
in the service of brainstorming, to show an extreme (too extreme IMO) 
way to forcibly advertise the T- in T-B3 classes.  But as I said, I 
think in practice it will be enough to make T-B3 classes look like OM-B1 
classes, which are clearly not immutable, even without a `non-final` 
modifier.




I don’t think we have to go so far as to outlaw tearing, but there 
have to be enough cues, at the use and declaration site, that 
something interesting is happening here.


Yes, cues.  And my point above, mainly, is that to the extent such cues 
are available in the world of OM-B1 classes already, we should make use 
of them for T-B3 classes.  And where not, such cues should make it 
really clear that there is an open invitation (public to untrusted 
parties) to make piecemeal edits to the fields of a T-B3 class.




There is no point to nanny people here given that only experts will 
want to play with it.


This is *definitely* wishful thinking.  People will hear that this is 
a tool for performance; 99% of Java developers will convince 
themselves they are experts because, performance!  Developers 
pathologically over-rotate towards whatever the Stack Overflow crowd 
says is faster.  (And so will Copilot.)  So, definitely no.  This 
argument is pure wishful thinking.   (I will admit to being 
occasionally tempted by this argument too, but then I snap out of it.)


I’m with Brian on this.

But we (the EG) can also fail, and make a primitive class too easy to 
use, what scare me is people using primitive class just because it's 
not nullable.


Yes, this is one of the many pitfalls we have to avoid!

This game is hard.


Yep.  Removing null for footprint, by moving fro