EG meeting, 2020-07-29
The next EG Zoom meeting is tomorrow, 4pm UTC (9am PDT, 12pm EDT). The only active topic in the mailing list is "Revisiting default values". We discussed it last time, and I'm not sure there's much new to add to the discussion right now. (I'm pursuing some internal explorations, not ready to report any conclusions yet.) So... I guess we can check in and see if there's interest in further discussion. If not, short meeting.
Re: Revisiting default values
> On Jul 28, 2020, at 11:33 AM, Tobi Ajila wrote: > > > Bucket #3 classes must be reference-default, and fields/arrays of their > > inline type are illegal outside of the declaring class. The declaring class > > can provide a flat array factory if it wants to. (A new idea from Tobi, > > he'll write it up for the thread.) I've since come to see this as a variant of Option L or Option M: we apply some restrictions + analysis to guarantee that uninitialized fields/arrays are never exposed. In this case, the guarantee is easy to prove because nobody can declare fields/arrays at all, except the class author. > This approach is appealing for the following reasons: no additional JVM > complexity (ie. no bytecode checks for the bad default value), no javac > boilerplate (ie. guards on member access, guards on method entries, etc.). On > the other there are two big drawbacks: no instance field flattening for these > types, and creating flattened arrays is a bit unnatural since it has to be > done via a factory. The biggest problem I see with approaches that prevent use of 'anewarray' is that they violate our uniform bytecode design, which is crucial to specialization. That is: how do I allocate a flat array of T in something like ArrayList? I can't be calling arbitrary factory methods depending on T. There's also a problem of exactly what these array factory methods are supposed to do. Sure, we can blame the author if they choose to leak garbage data through the factory. But... what are they going to put in the array, if not garbage data? This is really more of a Bucket #2 solution, where there exists some reasonable default to fill the array with. > I think it would help if we had a clear sense as to what proportion of > inline-types we think will have this "bad default" problem. Last year when we > discussed null-default inline types the thinking was that about 75% of the > motivation for null-defaults was migrating VBC, 20% for security, 5% for "I > want null in my value set.". My assumption is that the vast majority of > inline-types will not be migrated types, they will be new types. If this is > correct then it would appear that the default value problem is really a > problem for a minority of inline-types. My two cents: this is not about migrated vs. new types. This is about what's being modeled. A certain subset of inline classes will model some sort of numeric quantity with a natural "zero" value. Many others—I'd predict more than 50%, though it will depend a lot on how accommodating we are to these use cases—will represent non-numeric data without any "zero" analog. These will often wrap non-null references (strings, for example). (Challenge: can we think of any use cases for inline classes that have a natural all-zeros default value *other than* a numeric zero, a singleton with no fields, or the equivalent of Optional.empty()? Maybe a collection of boolean flags? Once you've got references, it's pretty unusual to expect them to be null.) Within the subset that doesn't have a good default, it's often the case that the class has limited exposure, and some programmers might happily trade safety guarantees for performance, knowing they can trust all clients (or if there's a bug, they'll catch it in testing). So maybe they'll be fine with the all-zeros default story. But any class that belongs to a public API, or even that has significant non-public exposure, is going to want to be confident that it's operating on valid data. > I would argue that the costs should be limited to types that want to opt-in > to not expose their default value or un-initialized value. Yes, agreed. Major demerits for any approach that imposes costs on programs that don't make use of no-default inline classes. > I think its important to decide if we want this kind of feature but also what > we are willing to give up to get it. The right way to think about it is this: there exist many classes that don't need identity and also don't have natural defaults. We're not going to make those classes cease to exist. It's not a "yes or no" choice, it's a "what is the sanctioned approach?" choice. The "yes or no" framing leads to attempts to compare performance with or without checks. But the "which approach" choice means choosing between performance of: - An identity class - A class with hand-coded checks in methods - A class that automatically checks member accesses, like we do with null - A dynamic requirement that fields/arrays of a certain class type have to be initialized before they're read - Etc.
Re: Revisiting default values
I think it would help if we had a clear sense as to what proportion of inline-types we think will have this "bad default" problem. Last year when we discussed null-default inline types the thinking was that about 75% of the motivation for null-defaults was migrating VBC, 20% for security, 5% for "I want null in my value set.". My assumption is that the vast majority of inline-types will not be migrated types, they will be new types. If this is correct then it would appear that the default value problem is really a problem for a minority of inline-types. Indeed, we've come up with good solutions for migrating VBCs (migrate it to a ref-default inline class) and "I want null in my value set" (then just use the ref projection.) For the "migrate from VBC" crowd, we offer the advice: "keep using `Foo` (really `Foo.ref`) in your APIs, but feel free to use `Foo.val` inside your implementation, where you are confident of no nulls." And further, we offer that advice to both the VBC author and its clients. So, we can expect existing APIs to continue to return Optional, but more fields of type `Optional.ref`, to get the flattening, and doing null checks in the constructor: this.foo = requireNonNull(foo) And this is one of the sources of "zero pollution"; a client may have a field of type `Foo.val` and just not initialize it in their constructor, and then later someone calls `foo.bar()`. Unlike with a reference type, which would NPE in this situation, we might enter the `bar()` method, which might not be defensively coded to check for the (meaningless) default, and it will do something dumb. Where dumb ranges from "Welcome to 1970" to "delete all my files." I think what we need for Bucket 3 (which I think we agree is more important than Bucket 2) is to (optionally, only for NGD inline classes) restore parity with reference types by ensuring that the receiver of a method invocation is never seen to be the default value. (We already do this for reference types; we NPE before the dispatch would succeed.) And the strategies we've been kicking around have ranged from "try to prevent the default from showing up in the heap" to "detect when the default shows at various times." If the important point in time is method dispatch, then we can probably simplify to: - Let some classes mark themselves as NGD (no good default) - At the point of invocation of an NGD instance method, check the receiver against the default, throw NPE if it is - Optionally, try to optimize this check by identifying (manually or automatically) a pivot field Note that even an unoptimized check is probably pretty fast already: "are all the bits zero." But we can probably often optimize down to a single-word comparison to zero. Note too that we can implement this check in either generated bytecode or in the VM; the semantics are the same, the latter is more secure.
RE: Revisiting default values
> Bucket #3 classes must be reference-default, and fields/arrays of their inline type are illegal outside of the declaring class. The declaring class can provide a flat array factory if it wants to. (A new idea from Tobi, he'll write it up for the thread.) ``` public sealed abstract class LegacyType permits LegacyType.val { //Formerly a concrete class, but now its abstract or maybe an interface //factory methods public static LegacyType makeALegacyType(...);//in some cases this already exists public static LegacyType[] newALegacyTypeArray(int size);//can be flattened } private inline class LegacyType.val extends LegacyType { ... } //this type is hidden, only LegacyType knows about it ``` This approach is based on what Kevin mentioned earlier, "For all of these types, there is one really fantastic default value that does everything you would want it to do: null. That is why these types should not become inline-types, or certainly not val-default inline types ...". Essentially, by making these types reference-default and by providing an avenue to restrict the value-projection to the reference-default type, the writer maintains control of where and when the value-projection is allowed to be observed thus solving the bad default problem. The writer also has the ability to supply a flattened array factory with initialized elements. This approach is appealing for the following reasons: no additional JVM complexity (ie. no bytecode checks for the bad default value), no javac boilerplate (ie. guards on member access, guards on method entries, etc.). On the other there are two big drawbacks: no instance field flattening for these types, and creating flattened arrays is a bit unnatural since it has to be done via a factory. Going back to Brian's comment: > I'd suggest, though, we back away from implementation techniques (you've got a good menu going already), and focus more on "what language do we want to build." You claim: > > I don't think totally excluding Buckets #2 and #3 is a very good outcome. > Which I think is a reasonable hypothesis, but I suggest we focus the discussion on whether we believe this or not, and what we might want to do about it (and when), first. I think it would help if we had a clear sense as to what proportion of inline-types we think will have this "bad default" problem. Last year when we discussed null-default inline types the thinking was that about 75% of the motivation for null-defaults was migrating VBC, 20% for security, 5% for "I want null in my value set.". My assumption is that the vast majority of inline-types will not be migrated types, they will be new types. If this is correct then it would appear that the default value problem is really a problem for a minority of inline-types. All the solutions proposed have some kind of cost associated with them, and these costs vary (ie. jvm complexity, throughput overhead, JIT compilation time, etc.). If the default value problem is only for a minority of the types, I would argue that the costs should be limited to types that want to opt-in to not expose their default value or un-initialized value. How we feel about this will determine which direction we choose to take when exploring the solution space. So, in short I want to second Brian's comment, I think its important to decide if we want this kind of feature but also what we are willing to give up to get it. --Tobi "valhalla-spec-experts" wrote on 2020/07/21 02:41:11 PM: > From: Dan Smith > To: valhalla-spec-experts > Cc: Brian Goetz > Date: 2020/07/21 02:41 PM > Subject: [EXTERNAL] Re: Revisiting default values > Sent by: "valhalla-spec-experts" > > > > On Jul 20, 2020, at 10:27 AM, Brian Goetz wrote: > > > > That said, doing so in the language is potentially more viable. > It would mean, for classes that opt into this treatment: > > > > - Ensuring that `C.default` evaluates to the right thing > > - Preventing `this` from escaping the constructor (which might be > a good thing to enforce for inline classes anyway) > > - Ensuring all fields are DA (which we do already), and that > assignments to fields in ctors are not their default value > > - Translating `new Foo[n]` (and reflective equivalent) with > something that initializes the array elements > > > > The goal is to keep default instances from being observed. If we > lock down `this` from constructors, the major cost here is > instantiating arrays of these things, but we already optimize array > initialization loops like this pretty well. > > > > Overall this doesn't seem terrible. It means that the cost of > this is borne by the users of classes that opt into this treatment, > and keeps the complexity out of the VM. It does mean that > "attackers" can generate bytecode to generate bad instances (a > problem we have with multiple vectors today.) > > > > Call this "L". > > More letters! > > Expanding on ways to support Bucket #3 by ensuring initialization of > fields/arrays: > > --- > >