----- Mail original ----- > De: "Brian Goetz" <brian.go...@oracle.com> > À: "amber-spec-experts" <amber-spec-experts@openjdk.java.net> > Envoyé: Mardi 19 Janvier 2021 20:59:00 > Objet: Pattern features for next iteration
> We're zeroing in on a feature list for the next iteration of pattern > matching, which includes: > > - New kinds of patterns > - Array patterns > - Record deconstruction patterns, including varargs records > - maybe: guard patterns (true(e), false(e)) > - maybe: AND patterns (P & Q), syntax TBD > > - New place for patterns: switch > - Extend switch to support more types (all types?) > - Extend switch on non-legacy types to support patterns as case labels > - New nullity behavior (including case null) > - New scoping + fallthrough > - Refined exhaustiveness analysis > > This document is an overview; the immediate goal is coming to consensus > that this is the set of features we should be targeting next. (Please, > no deep-dives into details of, say, translation until this is locked down.) > For me: extending switch to support any types + type pattern + case null seems a reasonable goal. About retconing switch, you talk about old switch vs new switch, i think there is another possible decomposition, old "case" vs new pattern, i.e. "case constant" can be retrofitted as a pattern. So a switch that mix type patterns and case can be written that way: switch(o) { case "foo" -> // equivalent to Object$$equals("foo") or type String && String$$equals("foo") type String s -> type Integer i -> } I've used "type" as keyword, it can be any other keyword ("instanceof" ?) or no keyword at all. I believe there is more freedom in term of grammar even if there is no keyword and no false dichotomy between old vs new. > > #### Array patterns > > An array pattern has the form > T[] { CommaSeparatedListOfPatternsWithOptionalTrailingDots } > OptionalIdentifier > > The target type of an array pattern is T[], and it matches if: > - the target is non-null > - the target is an instance of T[] > - the target has has an length equal to the arity of the nested > pattern list, or, if the nested pattern list ends with a `...`, length > greater than the arity of the nested pattern list > - for each nested pattern P_i, the i'th element of the list matches P_i > > It binds the union of the bindings of the nested patterns. Like > deconstruction patterns, it may optional bind a name to the target cast > to the right type, if a name is provided: > > case T[] { P, Q } ts: > > For multi-dimensional arrays, we interpret the pattern > > T[][] { P, Q } > > as matching P and Q to the first and second elements of the outermost array: > > case T[][] { T[] a, T[] b } > or > case T[][] { T[] { var a, var b }, T[] { var c, ... } } > > Note that `T[] ts` is an ordinary type pattern, not an array pattern. > > > #### Record patterns > > A record pattern has the form > > TypeName( CommaSeparatedListOfPatternsWithOptionalTrailingDots ) > OptionalIdentifier > > The type name must be a record class. It matches if: > > - the target is non-null > - the target is an instance of the specified record type > - each of the nested patterns matches the corresponding component of > the record, as served by the accessor method > > The binding list is the union of the binding lists of the nested > patterns; if the optional identifier is specified, it is bound to the > target, cast to TypeName. > > The treatment of null components is as per the matching rules for the > nested pattern; for type patterns, if the type is total on (a supertype > of) the type of the component, it is presumed to match, otherwise a > dynamic (instanceof) check is performed. > > > #### Guard patterns > > A guard pattern has the form true(BooleanExpr) or false(BooleanExpr). > It matches if the boolean expression is true or false, respectively. It > matches any target type and binds no bindings. > > > #### AND patterns > > Two patterns P and Q can be combined into an AND pattern denoted as P&Q > (syntax provisional.) P and Q must both be applicable to the target > type for P&Q to be; the binding list is the union of the binding lists. > If P does not match the target, no attempt is made to match Q. > > > #### Translation > > I'll write VARS(P) to describe the binding variables (name and type) of > a pattern P, and TR(target, P) to describe the compiler-tree desugaring > of a pattern match expression. When translating a pattern, its bindings > are hoisted out to a suitable scope. Javac has as "LetExpr" tree type, > `LetExpr{S*} in e`, which means we execute statements S* and evaluate > expression `e`, and its result is the result of evaluating `e`. union_i > is the union over a set indexed by i; product_i is the &&-product over > the set. |S| is the cardinality of a set. > > Type patterns: > VARS(T t) = { T t } > TR(target, T t) = true // for statically total type patterns, > otherwise: > TR(target, T t) = LetExpr { > boolean b = target instanceof T; > if (b) t = (T) target; > } in b > > Record patterns: > VARS(R(P*)) = union_i VARS(P_i) > TR(target, R(P*)) = LetExpr { > boolean b = target instanceof T; > R res = b ? (R) target : null; > } in (b && product_i TR(res.comp_i(), P_i)) > > Array patterns: > VARS(T[] { P* }) = union_i VARS(P_i) > TR(target, T[] { P* }) = LetExpr { > boolean b = target instanceof T; > T[] res = b ? (T[]) target : null; > } in (b && res.length == |P*| && product_i > TR(res[i], P_i)) > TR(target, T[] { P*, ... }) = LetExpr { > boolean b = target instanceof T; > T[] res = b ? (T[]) target : null; > } in (b && res.length >= |P*| && product_i > TR(res[i], P_i)) > > Guard patterns: > VARS(true(e)) = { } > VARS(false(e)) = { } > TR(true(e)) = e > TR(false(e)) = !e > > AND patterns: > VARS(P&Q) = VARS(P) union VARS(Q) > TR(P&Q) = TR(P) && TR(Q) > > The above represents an unrolling of nested patterns, where t~P(Q) is > unrolled into t~P(a) && a~Q (where t~P means "P matches t"). > > #### Binary compatibility > > In the future, it will be possible to explicitly declare deconstruction > patterns in records. In order to make adding an explicit canonical > deconstruction pattern a binary-compatible operation, by the time we > exit preview, some of this logic should be funneled through a > condy-based pattern runtime so that old binaries that ask for the > canonical deconstruction pattern for a record will get a reasonable > default if no explicit one exists, or the one in the class file if one > does. This entails replacing `res.comp_i()` in the translation with > calls to the pattern runtime. > > > ## Patterns in switch > > Introducing patterns into case labels creates another round of > adjustments for switch, and probably some new seams between old and new > functionality. > > #### More types > > Switch is currently limited to integral primitive types and their boxes, > strings, and enums. We'll these types "legacy switch types", and call > switches on legacy switch types, for which all of whose case labels are > "old-style" case labels, "legacy switches". (I'll develop the list of > candidate restrictions on legacy switches as I go; ideally this list > should be empty, but it probably won't be.) > > Pattern matching surely lets us switch over any reference type. It is an > open question whether we should fill out the three remaining primitive > types (boolean, float, double.) On the one hand, it seems oddly > incomplete to be able to switch over all but three types, while on the > other, switching over floats seems pretty niche, and we already have a > statement for switch over booleans (`if`). > > Assuming we decide for completeness, there are two interpretations of > switching on float: whether this is a legacy or pattern switch. > > Note that for nested patterns, we need to have type patterns for all the > primitive types, so we can match against `Foo(boolean b)` or `Bar(float > f)`, even if we don't allow switching on these types. > > #### Patterns as case labels > > The main change here, of course, is allowing patterns as case labels. > The specified pattern must be applicable to the static type of the > switch operand. > > One question we need to answer is how to integrate legacy case labels > into the story. There are basically three options here: > > - Retcon them. In this approach, we define a constant literal to be a > constant pattern. Then, all switches can be interpreted as pattern > switches. > > - Tolerated legacy. With `instanceof`, rather than defining a bare > type to be a pattern, we defined two forms of the `instanceof` > production, a legacy one (`instanceof T`) and a pattern one. We can do > the same with case labels, and possibly then say that legacy case labels > are only allowed in legacy switches. > > - Leave them to rot. In this approach, legacy case labels are allowed > on legacy switches only. > > The difference between the first (retcon them as constant patterns) is > whether literals are valid as patterns in contexts other than `case` > labels (instanceof, nested patterns.) I am steadily converging on the > conclusion that constant patterns are an "obvious but wrong" idea, more > "cute and clever" than sound language design. > > Note that with a sensible guard feature, constant patterns become pure > syntactic shortcut; `Foo(0)` can be expressed as `Foo(var x) & true(x == > 0)` in any pattern context. That's not to say that the benefit of the > shortcut is zero, but there's also a cost; further user confusion as to > whether `id(0)` is an invocation of a method with a constant, or > matching a pattern with a nested constant pattern. (If we decide we > really want constant patterns because the nestability leads to better > readability, I prefer we find another way to denote them, such as `== 0` > or `const(0)` or such.) > > So, let's assume that we're going to be somewhere on the spectrum > between "tolerate" and "rot". There are two questions to answer here: > > - In switches on legacy types, do we allow a mix of constant cases and > pattern cases? (Note that this only makes a difference when we have > more interesting patterns that can apply to these types, which mostly > means waiting for declared patterns.) > > - In non-legacy-switches, do we allow constant cases at all? > > If we say "yes" to either of these, then essentially we treat any > pattern with a mix of these as a pattern switch, and lower `case C` to > `case TargetType t & true(t.equals(C))` or similar. > > #### Nullity > > We've gone through an extensive discussion (actually, many extensive > discussions) on nullity in switch. It took several rounds to get over > the "why can't we just always throw on null", but I hope we're there > now. It seems that the sensible strategy is: > > - At the top level of a switch, total type patterns, and the new `case > null`, accept null, and no others. (Default does not accept null, but > you can fall out of a `case null` into a default.) If a switch has no > null-accepting patterns, then the switch throws NPE preemptively > (simulating current behavior), as if there is an invisible trailing > (possibly dead) `case null: NPE` in all switches. > > This strategy is fully backward compatible; anything that throws today, > will throw tomorrow. Total patterns must come last, so only the last > pattern, or an explicit `case null`, will see a null. > > For nested patterns, we've already defined the semantics of pattern > matching such that pattern matching never NPEs, and never introduces any > failures to match null that don't derive from an underlying dynamic type > test. The problem is solely with expectations having to do with the top > level of switch, based on historical preemptive null checks. > > #### Scoping and fallthrough > > Having decided to backtrack on merging binding variables in instanceof > (`x instanceof Foo(var a) || x instanceof Bar(var a))`), the > corresponding backtracking is to restrict fallthrough such that you > cannot fall into, or out of, a case clause that has bindings. > > This may require some adjustment in the scoping of binding variables, > which is probably not yet fully defined for switches, to ensure that > their scope ends at the next case declaration, so that we don't have a > problem with reusing names: > > case Foo(var x): break; > case Bar(var x): // OK to use `x` here > > It is OK to fall out out of a case without bindings into a case without > bindings, such as falling out of `case null` into `default`. > > #### Exhaustiveness > > There's a definition of a set of patterns being total on a target type, > outlined here: > https://github.com/openjdk/amber-docs/blob/master/site/design-notes/type-patterns-in-switch.md. > > > The main new feature here is allowing pattern sets that are > (optimistically) total on the target to be considered total in a switch > expression, and insert the appropriate throwing catch-call to handle the > remainder. > > Secondarily, we still have a question to confront about how to make > statement switches total like expression switches, so that users can opt > into stricter type checking. > > #### Translation considerations > > Legacy switches on primitives can continue to get translated as they are > now. > > Pattern switches are lowered to densely numbered int switches, and we > use an indy-based classifier to select the candidate case. The > bootstrap takes as its static argument list a list of Class objects (or > null), and as its dynamic arguments, the switch target and the case > number to start at. For example, we translate: > > switch (o) { > case Telegram t: A > case Integer i && true(i > 0): B > case Integer i: C > case Foo(Bar(var x)): D > case "moose": ... > } > > as (pseudocode): > > int index = -1; > LOOP: > while (true) { > index = indy[bootstrap=SwitchBootstrap[Telegram.class, > Integer.class, Integer.class, Foo.class, String.class]](o, index); > switch (idx) { > case 0: A > case 1: > if (!(i > 0)) > continue LOOP; > B; > case 2: C; > case 3: > if (!(fooBinding1 instanceof Bar(var x)) > continue LOOP; > D > case 4: > if (!(o.equals("moose")) > continue LOOP; > } > break LOOP; > } > > The classifier builds a decision tree based on types, but does not > include nested patterns or guards -- these are added by the desugaring. > If there is anything that causes us to enter a case and then later > decide "whoops", we break out of the switch, and re-run the classifier, > starting at the last index we tried. (Earlier experiments suggested this > was the sweet spot; while we can accurately model patterns and guards as > constant bundles of method handles, the complexity is high and the > benefit is not so high. > > For record/array patterns, we use the classifier to select the type, and > then unroll any nested patterns into the switch body. For ANDed > patterns, we classify on the type of the first pattern and unroll the > rest into the switch body. > > For legacy switches on enums and strings, we have the option of > replacing the current desugaring with an indy-based classification as we > do with pattern switches, to move complex desugaring code from the > compiler to simpler runtime code. We can do this now, later, or never. > > > #### Open issues > > We have a few open issues to sync on, though almost all can probably be > kicked down the road past first preview: > > - Boundaries between legacy switches and pattern switches > - allow mixing of constant and pattern cases? > - allow switch on float/double/boolean with constant labels? > - Allow statement switches to opt into totality? How? Rémi