EG meeting, 2020-07-29

2020-07-28 Thread Dan Smith
The next EG Zoom meeting is tomorrow, 4pm UTC (9am PDT, 12pm EDT).

The only active topic in the mailing list is "Revisiting default values". We 
discussed it last time, and I'm not sure there's much new to add to the 
discussion right now. (I'm pursuing some internal explorations, not ready to 
report any conclusions yet.) So... I guess we can check in and see if there's 
interest in further discussion. If not, short meeting.



Re: Revisiting default values

2020-07-28 Thread Dan Smith
> On Jul 28, 2020, at 11:33 AM, Tobi Ajila  wrote:
> 
> > Bucket #3 classes must be reference-default, and fields/arrays of their 
> > inline type are illegal outside of the declaring class. The declaring class 
> > can provide a flat array factory if it wants to. (A new idea from Tobi, 
> > he'll write it up for the thread.)

I've since come to see this as a variant of Option L or Option M: we apply some 
restrictions + analysis to guarantee that uninitialized fields/arrays are never 
exposed. In this case, the guarantee is easy to prove because nobody can 
declare fields/arrays at all, except the class author.

> This approach is appealing for the following reasons: no additional JVM 
> complexity (ie. no bytecode checks for the bad default value), no javac 
> boilerplate (ie. guards on member access, guards on method entries, etc.). On 
> the other there are two big drawbacks: no instance field flattening for these 
> types, and creating flattened arrays is a bit unnatural since it has to be 
> done via a factory.

The biggest problem I see with approaches that prevent use of 'anewarray' is 
that they violate our uniform bytecode design, which is crucial to 
specialization. That is: how do I allocate a flat array of T in something like 
ArrayList? I can't be calling arbitrary factory methods depending on T.

There's also a problem of exactly what these array factory methods are supposed 
to do. Sure, we can blame the author if they choose to leak garbage data 
through the factory. But... what are they going to put in the array, if not 
garbage data? This is really more of a Bucket #2 solution, where there exists 
some reasonable default to fill the array with.

> I think it would help if we had a clear sense as to what proportion of 
> inline-types we think will have this "bad default" problem. Last year when we 
> discussed null-default inline types the thinking was that about 75% of the 
> motivation for null-defaults was migrating VBC, 20% for security, 5% for "I 
> want null in my value set.". My assumption is that the vast majority of 
> inline-types will not be migrated types, they will be new types. If this is 
> correct then it would appear that the default value problem is really a 
> problem for a minority of inline-types. 

My two cents: this is not about migrated vs. new types. This is about what's 
being modeled. A certain subset of inline classes will model some sort of 
numeric quantity with a natural "zero" value. Many others—I'd predict more than 
50%, though it will depend a lot on how accommodating we are to these use 
cases—will represent non-numeric data without any "zero" analog. These will 
often wrap non-null references (strings, for example).

(Challenge: can we think of any use cases for inline classes that have a 
natural all-zeros default value *other than* a numeric zero, a singleton with 
no fields, or the equivalent of Optional.empty()? Maybe a collection of boolean 
flags? Once you've got references, it's pretty unusual to expect them to be 
null.)

Within the subset that doesn't have a good default, it's often the case that 
the class has limited exposure, and some programmers might happily trade safety 
guarantees for performance, knowing they can trust all clients (or if there's a 
bug, they'll catch it in testing). So maybe they'll be fine with the all-zeros 
default story. But any class that belongs to a public API, or even that has 
significant non-public exposure, is going to want to be confident that it's 
operating on valid data.

> I would argue that the costs should be limited to types that want to opt-in 
> to not expose their default value or un-initialized value.

Yes, agreed. Major demerits for any approach that imposes costs on programs 
that don't make use of no-default inline classes.

> I think its important to decide if we want this kind of feature but also what 
> we are willing to give up to get it.

The right way to think about it is this: there exist many classes that don't 
need identity and also don't have natural defaults. We're not going to make 
those classes cease to exist. It's not a "yes or no" choice, it's a "what is 
the sanctioned approach?" choice.

The "yes or no" framing leads to attempts to compare performance with or 
without checks. But the "which approach" choice means choosing between 
performance of:
- An identity class
- A class with hand-coded checks in methods
- A class that automatically checks member accesses, like we do with null
- A dynamic requirement that fields/arrays of a certain class type have to be 
initialized before they're read
- Etc.



Re: Revisiting default values

2020-07-28 Thread Brian Goetz


I think it would help if we had a clear sense as to what proportion of 
inline-types we think will have this "bad default" problem. Last year 
when we discussed null-default inline types the thinking was that 
about 75% of the motivation for null-defaults was migrating VBC, 20% 
for security, 5% for "I want null in my value set.". My assumption is 
that the vast majority of inline-types will not be migrated types, 
they will be new types. If this is correct then it would appear that 
the default value problem is really a problem for a minority of 
inline-types. 


Indeed, we've come up with good solutions for migrating VBCs (migrate it 
to a ref-default inline class) and "I want null in my value set" (then 
just use the ref projection.)


For the "migrate from VBC" crowd, we offer the advice: "keep using `Foo` 
(really `Foo.ref`) in your APIs, but feel free to use `Foo.val` inside 
your implementation, where you are confident of no nulls."  And further, 
we offer that advice to both the VBC author and its clients.  So, we can 
expect existing APIs to continue to return Optional, but more fields 
of type `Optional.ref`, to get the flattening, and doing null checks 
in the constructor:


    this.foo = requireNonNull(foo)

And this is one of the sources of "zero pollution"; a client may have a 
field of type `Foo.val` and just not initialize it in their constructor, 
and then later someone calls `foo.bar()`.  Unlike with a reference type, 
which would NPE in this situation, we might enter the `bar()` method, 
which might not be defensively coded to check for the (meaningless) 
default, and it will do something dumb.  Where dumb ranges from "Welcome 
to 1970" to "delete all my files."


I think what we need for Bucket 3 (which I think we agree is more 
important than Bucket 2) is to (optionally, only for NGD inline classes) 
restore parity with reference types by ensuring that the receiver of a 
method invocation is never seen to be the default value.  (We already do 
this for reference types; we NPE before the dispatch would succeed.)   
And the strategies we've been kicking around have ranged from "try to 
prevent the default from showing up in the heap" to "detect when the 
default shows at various times."


If the important point in time is method dispatch, then we can probably 
simplify to:


 - Let some classes mark themselves as NGD (no good default)
 - At the point of invocation of an NGD instance method, check the 
receiver against the default, throw NPE if it is
 - Optionally, try to optimize this check by identifying (manually or 
automatically) a pivot field


Note that even an unoptimized check is probably pretty fast already: 
"are all the bits zero."  But we can probably often optimize down to a 
single-word comparison to zero.


Note too that we can implement this check in either generated bytecode 
or in the VM; the semantics are the same, the latter is more secure.





RE: Revisiting default values

2020-07-28 Thread Tobi Ajila
> Bucket #3 classes must be reference-default, and fields/arrays of their
inline type are illegal outside of the declaring class. The declaring class
can provide a flat array factory if it wants to. (A new idea from Tobi,
he'll write it up for the thread.)

```
public sealed abstract class LegacyType permits LegacyType.val { //Formerly
a concrete class, but now its abstract or maybe an interface
//factory methods
public static LegacyType makeALegacyType(...);//in some cases this
already exists
public static LegacyType[] newALegacyTypeArray(int size);//can be
flattened
}

private inline class LegacyType.val extends LegacyType { ... } //this type
is hidden, only LegacyType knows about it
```

This approach is based on what Kevin mentioned earlier, "For all of these
types, there is one really fantastic default value that does everything you
would want it to do: null. That is why these types should not become
inline-types, or certainly not val-default inline types ...". Essentially,
by making these types reference-default and by providing an avenue to
restrict the value-projection to the reference-default type, the writer
maintains control of where and when the value-projection is allowed to be
observed thus solving the bad default problem. The writer also has the
ability to supply a flattened array factory with initialized elements.

This approach is appealing for the following reasons: no additional JVM
complexity (ie. no bytecode checks for the bad default value), no javac
boilerplate (ie. guards on member access, guards on method entries, etc.).
On the other there are two big drawbacks: no instance field flattening for
these types, and creating flattened arrays is a bit unnatural since it has
to be done via a factory.

Going back to Brian's comment:

> I'd suggest, though, we back away from implementation techniques (you've
got a good menu going already), and focus more on "what language do we want
to build."  You claim:
> > I don't think totally excluding Buckets #2 and #3 is a very good
outcome.
> Which I think is a reasonable hypothesis, but I suggest we focus the
discussion on whether we believe this or not, and what we might want to do
about it (and when), first.

I think it would help if we had a clear sense as to what proportion of
inline-types we think will have this "bad default" problem. Last year when
we discussed null-default inline types the thinking was that about 75% of
the motivation for null-defaults was migrating VBC, 20% for security, 5%
for "I want null in my value set.". My assumption is that the vast majority
of inline-types will not be migrated types, they will be new types. If this
is correct then it would appear that the default value problem is really a
problem for a minority of inline-types.

All the solutions proposed have some kind of cost associated with them, and
these costs vary (ie. jvm complexity, throughput overhead, JIT compilation
time, etc.). If the default value problem is only for a minority of the
types, I would argue that the costs should be limited to types that want to
opt-in to not expose their default value or un-initialized value. How we
feel about this will determine which direction we choose to take when
exploring the solution space.

So, in short I want to second Brian's comment, I think its important to
decide if we want this kind of feature but also what we are willing to give
up to get it.

--Tobi

"valhalla-spec-experts"  wrote
on 2020/07/21 02:41:11 PM:

> From: Dan Smith 
> To: valhalla-spec-experts 
> Cc: Brian Goetz 
> Date: 2020/07/21 02:41 PM
> Subject: [EXTERNAL] Re: Revisiting default values
> Sent by: "valhalla-spec-experts"

>
>
> > On Jul 20, 2020, at 10:27 AM, Brian Goetz 
wrote:
> >
> > That said, doing so in the language is potentially more viable.
> It would mean, for classes that opt into this treatment:
> >
> >  - Ensuring that `C.default` evaluates to the right thing
> >  - Preventing `this` from escaping the constructor (which might be
> a good thing to enforce for inline classes anyway)
> >  - Ensuring all fields are DA (which we do already), and that
> assignments to fields in ctors are not their default value
> >  - Translating `new Foo[n]` (and reflective equivalent) with
> something that initializes the array elements
> >
> > The goal is to keep default instances from being observed.  If we
> lock down `this` from constructors, the major cost here is
> instantiating arrays of these things, but we already optimize array
> initialization loops like this pretty well.
> >
> > Overall this doesn't seem terrible.  It means that the cost of
> this is borne by the users of classes that opt into this treatment,
> and keeps the complexity out of the VM.  It does mean that
> "attackers" can generate bytecode to generate bad instances (a
> problem we have with multiple vectors today.)
> >
> > Call this "L".
>
> More letters!
>
> Expanding on ways to support Bucket #3 by ensuring initialization of
> fields/arrays:
>
> ---
>
>