If we are to choose a contextual keyword for this purpose, I think “where” is the most natural for reading out loud, and is shorter (in several ways) than “only-if” (the hyphenation of which strikes me as a bit precious in this context).
—Guy > On Aug 4, 2020, at 2:11 PM, Brian Goetz <[email protected]> wrote: > > One thing this left open was the actual syntax of guards. (We know its > snowing in hell now, because I am actually encouraging a syntax > conversation.) > > Patterns in `instanceof` do not need guards, because `instanceof` is an > expression, and expressions conjoin with `&&`: > > if (x instanceof Foo f && f.bar() == 3) { ... } > > We explored the approach of making the boolean guard expression part of the > pattern -- where the above would parse as `x instanceof P`, where P is `Foo f > && f.bar() == 3`. Since pattern matching is conditional, we could think of > boolean expressions as patterns that are independent of their target. But, > this approach didn't pan out due to various ambiguities. > > The most obvious ambiguity was the obvious interpretation of constant > patterns; that `0` could be the literal zero or a pattern that matches zero. > (I have since proposed we try to avoid constant patterns entirely.) Under > this interpretation, not only was there room for confusion ("Is foo(0) an > invocation of foo, or a pattern?"), but there were puzzlers like: > > switch (b) { > case true && false: ... > case false && true: ... > } > > It would not be clear whether this would be two patterns conjoined together > (neither of which would match anything) or the constant pattern `true && > false`. (There are similar ambiguities with deconstruction patterns with no > bindings.) > > Overall, all of this was enough to sour me on trying to press && into service > here. Which leaves some relatively mundane syntax options: > > case P when e > case P where e > case P if e > case P only-if e > case P where (e) > > and I guess weirder things like > > case P &&& e > case P : e > etc > > I particularly don't like `if` since it makes it even harder to tell where > the case ends and the consequent begins. Also, `e` should not contain a > switch expression, since no one wants to try to parse: > > case Foo f where switch (f.bar()) { > case Bar b -> 3; > } > 3 -> blah(); > > in their heads. (We already excluded switch expressions from candidates for > constant expressions in 12 for similar reasons.) > > I mostly think this is a matter of picking a (contextual) keyword. (I kind > of like that `only-if` could be an actual keyword.) > > > > (Looking ahead, if we ever want to reenable support for merging, we might > have patterns like: > > case Foo(int x), Bar(int x) where x > 0: > > and we'd have to accept that the guard applies to _all_ the patterns.) > > > On 6/24/2020 10:44 AM, Brian Goetz wrote: >> There are a lot of directions we could take next for pattern matching. The >> one that builds most on what we've already done, and offers significant >> incremental expressiveness, is extending the type patterns we already have >> to a new context: switch. (There is still plenty of work to do on >> deconstruction patterns, pattern assignment, etc, but these require more >> design work.) >> >> Here's an overview of where I think we are here. >> >> [JEP 305][jep305] introduced the first phase of [pattern >> matching][patternmatch] >> into the Java language. It was deliberately limited, focusing on only one >> kind >> of pattern (type test patterns) and one linguistic context (`instanceof`). >> Having introduced the concept to Java developers, we can now extend both the >> kinds of patterns and the linguistic context where patterns are used. >> >> ## Patterns in switch >> >> The obvious next context in which to introduce pattern matching is `switch`; >> a >> switch using patterns as `case` labels can replace `if .. else if` chains >> with >> a more direct way of expressing a multi-way conditional. >> >> Unfortunately, `switch` is one of the most complex, irregular constructs we >> have >> in Java, so we must teach it some new tricks while avoiding some existing >> traps. >> Such tricks and traps may include: >> >> **Typing.** Currently, the operand of a `switch` may only be one of the >> integral primitive types, the box type of an integral primitive, `String`, >> or an >> `enum` type. (Further, if the `switch` operand is an `enum` type, the `case` >> labels must be _unqualified_ enum constant names.) Clearly we can relax this >> restriction to allow other types, and constrain the case labels to only be >> patterns that are applicable to that type, but it may leave a seam of >> "legacy" >> vs "pattern" switch, especially if we do not adopt bare constant literals as >> the denotation of constant patterns. (We have confronted this issue before >> with >> expression switch, and concluded that it was better to rehabilitate the >> `switch` >> we have rather than create a new construct, and we will make the same choice >> here, but the cost of this is often a visible seam.) >> >> **Parsing.** The grammar currently specifies that the operand of a `case` >> label >> is a `CaseConstant`, which casts a wide syntactic net, later narrowed with >> post-checks after attribution. This means that, since parsing is done >> before we >> know the type of the operand, we must be watchful for ambiguities between >> patterns and expressions (and possibly refine the production for `case` >> labels.) >> >> **Nullity.** The `switch` construct is currently hostile to `null`, but some >> patterns do match `null`, and it may be desirable if nulls can be handled >> within a suitably crafted `switch`. >> >> **Exhaustiveness.** For switches over the permitted subtypes of sealed >> types, >> we will want to be able to do exhaustiveness analysis -- including for nested >> patterns (i.e., if `Shape` is `Circle` or `Rect`, then `Box(Circle c)` and >> `Box(Rect r)` are exhaustive on `Box<Shape>`.) >> >> **Fallthrough.** Fallthrough is everyone's least favorite feature of >> `switch`, >> but it exists for a reason. (The mistake was making fallthrough the default >> behavior, but that ship has sailed.) In the absence of an OR pattern >> combinator, one might find fallthrough in switch useful in conjunction with >> patterns: >> >> ``` >> case Box(int x): >> case Bag(int x): >> // use x >> ``` >> >> However, it is likely that we will, at least initially, disallow falling out >> of, or into, a case label with binding variables. >> >> #### Translation >> >> Switches on primitives and their wrapper types are translated using the >> `tableswitch` or `lookupswitch` bytecodes; switches on strings and enums are >> lowered in the compiler to switches involving hash codes (for strings) or >> ordinals (for enums.) >> >> For switches on patterns, we would need a new strategy, one likely built on >> `invokedynamic`, where we lower the cases to a densely numbered `int` switch, >> and then invoke a classifier function with the operand which tells us the >> first >> case number it matches. So a switch like: >> >> ``` >> switch (o) { >> case P: A >> case Q: B >> } >> ``` >> >> is lowered to: >> >> ``` >> int target = indy[BSM=PatternSwitch, args=[P,Q]](o) >> switch (target) { >> case 0: A >> case 1: B >> } >> ``` >> >> A symbolic description of the patterns is provided as the bootstrap argument >> list, which builds a decision tree based on analysis of the patterns and >> their >> target types. >> >> #### Guards >> >> No matter how rich our patterns are, it is often the case that we will want >> to provide additional filtering on the results of a pattern: >> >> ``` >> if (shape instanceof Cylinder c && c.color() == RED) { ... } >> ``` >> >> Because we use `instanceof` as part of a boolean expression, it is easy to >> narrow the results by conjoining additional checks with `&&`. But in a >> `case` >> label, we do not necessarily have this opportunity. Worse, the semantics of >> `switch` mean that once a `case` label is selected, there is no way to say >> "oops, forget it, keep trying from the next label". >> >> It is common in languages with pattern matching to support some form of >> "guard" >> expression, which is a boolean expression that conditions whether the case >> matches, such as: >> >> ``` >> case Point(var x, var y) >> __where x == y: ... >> ``` >> >> Bindings from the pattern would have to be available in guard expressions. >> >> Syntactic options (and hazards) for guards abound; users would probably find >> it >> natural to reuse `&&` to attach guards to patterns; C# has chosen `when` for >> introducing guards; we could use `case P if (e)`, etc. Whatever we do here, >> there is a readability risk, as the more complex guards are, the harder it >> is >> to tell where the case label ends and the "body" begins. (And worse if we >> allow >> switch expressions inside guards.) >> >> An alternate to guards is to allow an imperative `continue` statement in >> `switch`, which would mean "keep trying to match from the next label." Given >> the existing semantics of `continue`, this is a natural extension, but since >> `continue` does not currently have meaning for switch, some work would have >> to >> be done to disambiguate continue statements in switches enclosed in loops. >> The >> imperative version is strictly more expressive than most reasonable forms of >> the >> declarative version, but users are likely to prefer the declarative version. >> >> ## Nulls >> >> Almost no language design exercise is complete without some degree of >> wrestling >> with `null`. As we define more complex patterns than simple type patterns, >> and >> extend constructs such as `switch` (which have existing opinions about >> nullity) >> to support patterns, we need to have a clear understanding of which patterns >> are nullable, and separate the nullity behaviors of patterns from the nullity >> behaviors of those constructs which use patterns. >> >> ## Nullity and patterns >> >> This topic has a number of easily-tangled concerns: >> >> - **Construct nullability.** Constructs to which we want to add pattern >> awareness (`instanceof`, `switch`) already have their own opinion about >> nulls. Currently, `instanceof` always says false when presented with a >> `null`, and `switch` always NPEs. We may, or may not, wish to refine >> these >> rules in some cases. >> - **Pattern nullability.** Some patterns clearly would never match `null` >> (such as deconstruction patterns), whereas others (an "any" pattern, and >> surely the `null` constant pattern) might make sense to match null. >> - **Refactoring friendliness.** There are a number of cases that we would >> like >> to freely refactor back and forth, such as certain chains of `if ... else >> if` >> with switches. >> - **Nesting vs top-level.** The "obvious" thing to do at the top level of a >> construct is not always the "obvious" thing to do in a nested construct. >> - **Totality vs partiality.** When a pattern is partial on the operand type >> (e.g., `case String` when the operand of switch is `Object`), it is almost >> never the case we want to match null (except in the case of the `null` >> constant pattern), whereas when a pattern is total on the operand type >> (e.g., >> `case Object` in the same example), it is more justifiable to match null. >> - **Inference.** It would be nice if a `var` pattern were simply inference >> for >> a type pattern, rather than some possibly-non-denotable union. >> >> As a starting example, consider: >> >> ``` >> record Box(Object o) { } >> >> Box box = ... >> switch (box) { >> case Box(Chocolate c): >> case Box(Frog f): >> case Box(var o): >> } >> ``` >> >> It would be highly confusing and error-prone for either of the first two >> patterns to match `Box(null)` -- given that `Chocolate` and `Frog` have no >> type >> relation, it should be perfectly safe to reorder the two. But, because the >> last >> pattern seems so obviously total on boxes, it is quite likely that what the >> author wants is to match all remaining boxes, including those that contain >> null. >> (Further, it would be terrible if there were _no_ way to say "Match any >> `Box`, >> even if it contains `null`. (While one might initially think this could be >> repaired with OR patterns, imagine that `Box` had _n_ components -- we'd >> need to >> OR together _2^n_ patterns, with complex merging, to express all the possible >> combinations of nullity.)) >> >> Scala and C# took the approach of saying that "var" patterns are not just >> type >> inference, they are "any" patterns -- so `Box(Object o)` matches boxes >> containing a non-null payload, where `Box(var o)` matches all boxes. This >> means, unfortunately, that `var` is not mere type inference -- which >> complicates >> the role of `var` in the language considerably. Users should not have to >> choose >> between the semantics they want and being explicit about types; these should >> be >> orthogonal choices. The above `switch` should be equivalent to: >> >> ``` >> Box box = ... >> switch (box) { >> case Box(Chocolate c): >> case Box(Frog f): >> case Box(Object o): >> } >> ``` >> >> and the choice to use `Object` or `var` should be solely one of whether the >> manifest types are deemed to improve or impair readability. >> >> #### Construct and pattern nullability >> >> Currently, `instanceof` always says `false` on `null`, and `switch` always >> throws on `null`. Whatever null opinions a construct has, these are applied >> before we even test any patterns. >> >> We can formalize the intuition outlined above as: type patterns that are >> _total_ >> on their target operand (`var x`, and `T t` on an operand of type `U`, where >> `U >> <: T`) match null, and non-total type patterns do not. (Another way to say >> this is: a `var` pattern is the "any" pattern, and a type pattern that is >> total >> on its operand type is also an "any" pattern.) Additionally, the `null` >> constant pattern matches null. These are the _only_ nullable patterns. >> >> In our `Box` example, this means that the last case (whether written as >> `Box(var >> o)` or `Box(Object o)`) matches all boxes, including those containing null >> (because the nested pattern is total on the nested operand), but the first >> two >> cases do not. >> >> If we retain the current absolute hostility of `switch` to nulls, we can't >> trivially refactor from >> >> ``` >> switch (o) { >> case Box(Chocolate c): >> case Box(Frog f): >> case Box(var o): >> } >> ``` >> to >> >> ``` >> switch (o) { >> case Box(var contents): >> switch (contents) { >> case Chocolate c: >> case Frog f: >> case Object o: >> } >> } >> } >> ``` >> >> because the inner `switch(contents)` would NPE before we tried to match any >> of >> the patterns it contains. Instead, the user would explicitly have to do an >> `if >> (contents == null)` test, and, if the intent was to handle `null` in the same >> way as the `Object o` case, some duplication of code would be needed. We can >> address this sharp corner by slightly relaxing the null-hostility of >> `switch`, >> as described below. >> >> A similar sharp corner is the decomposition of a nested pattern `P(Q)` into >> `P(alpha) & alpha instanceof Q`; while this is intended to be a universally >> valid transformation, if P's 1st component might be null and Q is total, >> this >> transformation would not be valid because of the existing (mild) >> null-hostility >> of `instanceof`. Again, we may be able to address this by adjusting the >> rules >> surrounding `instanceof` slightly. >> >> ## Generalizing switch >> >> The refactoring example above motivates why we might want to relax the >> null-handling behavior of `switch`. On the other hand, the one thing the >> current behavior has going for it is that at least the current behavior is >> easy >> to reason about; it always throws when confronted with a `null`. Any relaxed >> behavior would be more complex; some switches would still have to throw (for >> compatibility with existing semantics), and some (which can't be expressed >> today) would accept nulls. This is a tricky balance to achieve, but I think >> we >> have a found a good one. >> >> A starting point is that we don't want to require readers to do an _O(n)_ >> analysis of each of the `case` labels just to determine whether a given >> switch >> accepts `null` or not; this should be an _O(1)_ analysis. (We do not want to >> introduce a new flavor of `switch`, such as `switch-nullable`; this might >> seem >> to fix the proximate problem but would surely create others. As we've done >> with >> expression switch and patterns, we'd rather rehabilitate `switch` than create >> an almost-but-not-quite-the-same variant.) >> >> Let's start with the null pattern, which we'll spell for sake of exposition >> `case null`. What if you were allowed to say `case null` in a switch, and >> the >> switch would do the obvious thing? >> >> ``` >> switch (o) { >> case null -> System.out.println("Ugh, null"); >> case String s -> System.out.println("Yay, non-null: " + s); >> } >> ``` >> >> Given that the `case null` appears so close to the `switch`, it does not seem >> confusing that this switch would match `null`; the existence of `case null` >> at >> the top of the switch makes it pretty clear that this is intended behavior. >> (We >> could further restrict the null pattern to being the first pattern in a >> switch, >> to make this clearer.) >> >> Now, let's look at the other end of the switch -- the last case. What if the >> last pattern is a total pattern? (Note that if any `case` has a total >> pattern, >> it _must_ be the last one, otherwise the cases after that would be dead, >> which >> would be an error.) Is it also reasonable for that to match null? After >> all, >> we're saying "everything": >> >> ``` >> switch (o) { >> case String s: ... >> case Object o: ... >> } >> ``` >> >> Under this interpretation, the switch-refactoring anomaly above goes away. >> >> The direction we're going here is that if we can localize the >> null-acceptance of >> switches in the first (is it `case null`?) and last (is it total?) cases, >> then >> the incremental complexity of allowing _some_ switches to accept null might >> be >> outweighed by the incremental benefit of treating `null` more uniformly (and >> thus eliminating the refactoring anomalies.) Note also that there is no >> actual >> code compatibility issue; this is all mental-model compatibility. >> >> So far, we're suggesting: >> >> - A switch with a constant `null` case will accept nulls; >> - If present, a constant `null` case must go first; >> - A switch with a total (any) case matches also accepts nulls; >> - If present, a total (any) case must go last. >> >> #### Relocating the problem >> >> It might be more helpful to view these changes as not changing the behavior >> of >> `switch`, but of the `default` case of `switch`. We can equally well >> interpret >> the current behavior as: >> >> - `switch` always accepts `null`, but matching the `default` case of a >> `switch` >> throws `NullPointerException`; >> - any `switch` without a `default` case has an implicit do-nothing `default` >> case. >> >> If we adopt this change of perspective, then `default`, not `switch`, is in >> control of the null rejection behavior -- and we can view these changes as >> adjusting the behavior of `default`. So we can recast the proposed changes >> as: >> >> - Switches accept null; >> - A constant `null` case will match nulls, and must go first; >> - A total switch (a switch with a total `case`) cannot have a `default` >> case; >> - A non-total switch without a `default` case gets an implicit do-nothing >> `default` case; >> - Matching the (implicit or explicit) default case with a `null` operand >> always throws NPE. >> >> The main casualty here is that the `default` case does not mean the same >> thing as `case var x` or `case Object o`. We can't deprecate `default`, but >> for pattern switches, it becomes much less useful. >> >> #### What about method (declared) patterns? >> >> So far, we've declared all patterns, except the `null` constant pattern and >> the >> total (any) pattern, to not match `null`. What about patterns that are >> explicitly declared in code? It turns out we can rule out these matching >> `null` fairly easily. >> >> We can divide declared patterns into three kinds: deconstruction patterns >> (dual >> to constructors), static patterns (dual to static methods), and instance >> patterns (dual to instance methods.) For both deconstruction and instance >> patterns, the match target becomes the receiver; method bodies are never >> expected to deal with the case where `this == null`. >> >> For static patterns, it is conceivable that they could match `null`, but this >> would put a fairly serious burden on writers of static patterns to check for >> `null` -- which they would invariably forget, and many more NPEs would ensue. >> (Think about writing the pattern for `Optional.of(T t)` -- it would be >> overwhelmingly likely we'd forget to check the target for nullity.) SO there >> is a strong argument to simply say "declared patterns never match null", to >> not put writers of such patterns in this situation. >> >> So, only the top and bottom patterns in a switch could match null; if the top >> pattern is not `case null`, and the bottom pattern is not total, then the >> switch >> throws NPE on null, otherwise it accepts null. >> >> #### Adjusting instanceof >> >> The remaining anomaly we had was about unrolling nested patterns when the >> inner >> pattern is total. We can plug this by simply outlawing total patterns in >> `instanceof`. >> >> This may seem like a cheap trick, but it makes sense on its own. If the >> following statement was allowed: >> >> ``` >> if (e instanceof var x) { X } >> ``` >> >> it would simply be confusing; on the one hand, it looks like it should always >> match, but on the other, `instanceof` is historically null-hostile. And, if >> the >> pattern always matches, then the `if` statement is silly; it should be >> replaced >> with: >> >> ``` >> var x = e; >> X >> ``` >> >> since there's nothing conditional about it. So by banning "any" patterns on >> the >> RHS of `instanceof`, we both avoid a confusion about what is going to happen, >> and we prevent the unrolling anomaly. >> >> For reasons of compatibility, we will have to continue to allow >> >> ``` >> if (e instanceof Object) { ... } >> ``` >> >> which succeeds on all non-null operands. >> >> We've been a little sloppy with the terminology of "any" vs "total"; note >> that >> in >> >> ``` >> Point p; >> if (p instanceof Point(var x, var y)) { } >> ``` >> >> the pattern `Point(var x, var y)` is total on `Point`, but not an "any" >> pattern >> -- it still doesn't match on p == null. >> >> On the theory that an "any" pattern in `instanceof` is silly, we may also >> want >> to ban other "silly" patterns in `instanceof`, such as constant patterns, >> since >> all of the following have simpler forms: >> >> ``` >> if (x instanceof null) { ... } >> if (x instanceof "") { ... } >> if (i instanceof 3) { ... } >> ``` >> >> In the first round (type patterns in `instanceof`), we mostly didn't confront >> this issue, saying that `instanceof T t` matched in all the cases where >> `instanceof T` would match. But given that the solution for `switch` relies >> on "any" patterns matching null, we may wish to adjust the behavior of >> `instanceof` before it exits preview. >> >> >> [jep305]: https://openjdk.java.net/jeps/305 >> <https://openjdk.java.net/jeps/305> >> [patternmatch]: pattern-match.html >> >
