Re: Next up for patterns: type patterns in switch

Guy Steele Tue, 04 Aug 2020 11:29:11 -0700

If we are to choose a contextual keyword for this purpose, I think “where” is 
the most natural for reading out loud, and is shorter (in several ways) than 
“only-if” (the hyphenation of which strikes me as a bit precious in this 
context).


—Guy

> On Aug 4, 2020, at 2:11 PM, Brian Goetz <[email protected]> wrote:
> 
> One thing this left open was the actual syntax of guards.  (We know its 
> snowing in hell now, because I am actually encouraging a syntax 
> conversation.)  
> 
> Patterns in `instanceof` do not need guards, because `instanceof` is an 
> expression, and expressions conjoin with `&&`:
> 
>     if (x instanceof Foo f && f.bar() == 3) { ... }
> 
> We explored the approach of making the boolean guard expression part of the 
> pattern -- where the above would parse as `x instanceof P`, where P is `Foo f 
> && f.bar() == 3`.  Since pattern matching is conditional, we could think of 
> boolean expressions as patterns that are independent of their target.  But, 
> this approach didn't pan out due to various ambiguities.  
> 
> The most obvious ambiguity was the obvious interpretation of constant 
> patterns; that `0` could be the literal zero or a pattern that matches zero.  
> (I have since proposed we try to avoid constant patterns entirely.)  Under 
> this interpretation, not only was there room for confusion ("Is foo(0) an 
> invocation of foo, or a pattern?"), but there were puzzlers like:
> 
>     switch (b) {
>         case true && false: ...
>         case false && true: ...
>     }
> 
> It would not be clear whether this would be two patterns conjoined together 
> (neither of which would match anything) or the constant pattern `true && 
> false`.  (There are similar ambiguities with deconstruction patterns with no 
> bindings.)  
> 
> Overall, all of this was enough to sour me on trying to press && into service 
> here.  Which leaves some relatively mundane syntax options:
> 
>     case P when e
>     case P where e
>     case P if e
>     case P only-if e
>     case P where (e)
> 
> and I guess weirder things like
> 
>     case P &&& e
>     case P : e
>     etc
> 
> I particularly don't like `if` since it makes it even harder to tell where 
> the case ends and the consequent begins.  Also, `e` should not contain a 
> switch expression, since no one wants to try to parse:
> 
>     case Foo f where switch (f.bar()) {
>         case Bar b -> 3;
>     } > 3 -> blah();
> 
> in their heads.  (We already excluded switch expressions from candidates for 
> constant expressions in 12 for similar reasons.)  
> 
> I mostly think this is a matter of picking a (contextual) keyword.  (I kind 
> of like that `only-if` could be an actual keyword.)
> 
> 
> 
> (Looking ahead, if we ever want to reenable support for merging, we might 
> have patterns like:
> 
>     case Foo(int x), Bar(int x) where x > 0:
> 
> and we'd have to accept that the guard applies to _all_ the patterns.)
> 
> 
> On 6/24/2020 10:44 AM, Brian Goetz wrote:
>> There are a lot of directions we could take next for pattern matching.  The 
>> one that builds most on what we've already done, and offers significant 
>> incremental expressiveness, is extending the type patterns we already have 
>> to a new context: switch.  (There is still plenty of work to do on 
>> deconstruction patterns, pattern assignment, etc, but these require more 
>> design work.)  
>> 
>> Here's an overview of where I think we are here.  
>> 
>> [JEP 305][jep305] introduced the first phase of [pattern 
>> matching][patternmatch]
>> into the Java language.  It was deliberately limited, focusing on only one 
>> kind
>> of pattern (type test patterns) and one linguistic context (`instanceof`).
>> Having introduced the concept to Java developers, we can now extend both the
>> kinds of patterns and the linguistic context where patterns are used.
>> 
>> ## Patterns in switch
>> 
>> The obvious next context in which to introduce pattern matching is `switch`; 
>>  a
>> switch using patterns as `case` labels can replace `if .. else if` chains 
>> with
>> a more direct way of expressing a multi-way conditional.  
>> 
>> Unfortunately, `switch` is one of the most complex, irregular constructs we 
>> have
>> in Java, so we must teach it some new tricks while avoiding some existing 
>> traps.
>> Such tricks and traps may include:
>> 
>> **Typing.**  Currently, the operand of a `switch` may only be one of the
>> integral primitive types, the box type of an integral primitive, `String`, 
>> or an
>> `enum` type.  (Further, if the `switch` operand is an `enum` type, the `case`
>> labels must be _unqualified_ enum constant names.)  Clearly we can relax this
>> restriction to allow other types, and constrain the case labels to only be
>> patterns that are applicable to that type, but it may leave a seam of 
>> "legacy"
>> vs "pattern" switch, especially if we do not adopt bare constant literals as
>> the denotation of constant patterns.  (We have confronted this issue before 
>> with
>> expression switch, and concluded that it was better to rehabilitate the 
>> `switch`
>> we have rather than create a new construct, and we will make the same choice
>> here, but the cost of this is often a visible seam.)
>> 
>> **Parsing.**  The grammar currently specifies that the operand of a `case` 
>> label
>> is a `CaseConstant`, which casts a wide syntactic net, later narrowed with
>> post-checks after attribution.  This means that, since parsing is done 
>> before we
>> know the type of the operand, we must be watchful for ambiguities between
>> patterns and expressions (and possibly refine the production for `case` 
>> labels.)
>> 
>> **Nullity.**  The `switch` construct is currently hostile to `null`, but some
>> patterns do match `null`, and it may be desirable if nulls can be handled
>> within a suitably crafted `switch`.  
>> 
>> **Exhaustiveness.**  For switches over the permitted subtypes of sealed 
>> types,
>> we will want to be able to do exhaustiveness analysis -- including for nested
>> patterns (i.e., if `Shape`  is `Circle` or `Rect`, then `Box(Circle c)` and
>> `Box(Rect r)` are exhaustive on `Box<Shape>`.)
>> 
>> **Fallthrough.**  Fallthrough is everyone's least favorite feature of 
>> `switch`,
>> but it exists for a reason.  (The mistake was making fallthrough the default
>> behavior, but that ship has sailed.)  In the absence of an OR pattern
>> combinator, one might find fallthrough in switch useful in conjunction with
>> patterns:
>> 
>> ```
>> case Box(int x):
>> case Bag(int x):
>>     // use x
>> ```
>> 
>> However, it is likely that we will, at least initially, disallow falling out
>> of, or into, a case label with binding variables.
>> 
>> #### Translation
>> 
>> Switches on primitives and their wrapper types are translated using the
>> `tableswitch` or `lookupswitch` bytecodes; switches on strings and enums are
>> lowered in the compiler to switches involving hash codes (for strings) or
>> ordinals (for enums.)
>> 
>> For switches on patterns, we would need a new strategy, one likely built on
>> `invokedynamic`, where we lower the cases to a densely numbered `int` switch,
>> and then invoke a classifier function with the operand which tells us the 
>> first
>> case number it matches.  So a switch like:
>> 
>> ```
>> switch (o) {
>>     case P: A
>>     case Q: B
>> }
>> ```
>> 
>> is lowered to:
>> 
>> ```
>> int target = indy[BSM=PatternSwitch, args=[P,Q]](o)
>> switch (target) {
>>     case 0: A
>>     case 1: B
>> }
>> ```
>> 
>> A symbolic description of the patterns is provided as the bootstrap argument
>> list, which builds a decision tree based on analysis of the patterns and 
>> their
>> target types.
>> 
>> #### Guards
>> 
>> No matter how rich our patterns are, it is often the case that we will want
>> to provide additional filtering on the results of a pattern:
>> 
>> ```
>> if (shape instanceof Cylinder c && c.color() == RED) { ... }
>> ```
>> 
>> Because we use `instanceof` as part of a boolean expression, it is easy to
>> narrow the results by conjoining additional checks with `&&`.  But in a 
>> `case`
>> label, we do not necessarily have this opportunity.  Worse, the semantics of
>> `switch` mean that once a `case` label is selected, there is no way to say
>> "oops, forget it, keep trying from the next label".  
>> 
>> It is common in languages with pattern matching to support some form of 
>> "guard"
>> expression, which is a boolean expression that conditions whether the case
>> matches, such as:
>> 
>> ```
>> case Point(var x, var y)
>>     __where x == y: ...
>> ```
>> 
>> Bindings from the pattern would have to be available in guard expressions.
>> 
>> Syntactic options (and hazards) for guards abound; users would probably find 
>> it
>> natural to reuse `&&` to attach guards to patterns; C# has chosen `when` for
>> introducing guards; we could use `case P if (e)`, etc.  Whatever we do here,
>> there is a readability risk,  as the more complex guards are, the harder it 
>> is
>> to tell where the case label ends and the "body" begins.  (And worse if we 
>> allow
>> switch expressions inside guards.)
>> 
>> An alternate to guards is to allow an imperative `continue` statement in
>> `switch`, which would mean "keep trying to match from the next label."  Given
>> the existing semantics of `continue`, this is a natural extension, but since
>> `continue` does not currently have meaning for switch, some work would have 
>> to
>> be done to disambiguate continue statements in switches enclosed in loops.  
>> The
>> imperative version is strictly more expressive than most reasonable forms of 
>> the
>> declarative version, but users are likely to prefer the declarative version.
>> 
>> ## Nulls
>> 
>> Almost no language design exercise is complete without some degree of 
>> wrestling
>> with `null`.  As we define more complex patterns than simple type patterns, 
>> and
>> extend constructs such as `switch` (which have existing opinions about 
>> nullity)
>> to support patterns, we need to have a clear understanding of which patterns
>> are nullable, and separate the nullity behaviors of patterns from the nullity
>> behaviors of those constructs which use patterns.  
>> 
>> ## Nullity and patterns
>> 
>> This topic has a number of easily-tangled concerns:
>> 
>>  - **Construct nullability.**  Constructs to which we want to add pattern
>>    awareness (`instanceof`, `switch`) already have their own opinion about
>>    nulls.  Currently, `instanceof` always says false when presented with a
>>    `null`, and `switch` always NPEs.  We may, or may not, wish to refine 
>> these
>>    rules in some cases.
>>  - **Pattern nullability.**  Some patterns clearly would never match `null`
>>    (such as deconstruction patterns), whereas others (an "any" pattern, and
>>    surely the `null` constant pattern) might make sense to match null.
>>  - **Refactoring friendliness.**  There are a number of cases that we would 
>> like
>>    to freely refactor back and forth, such as certain chains of `if ... else 
>> if`
>>    with switches.
>>  - **Nesting vs top-level.**  The "obvious" thing to do at the top level of a
>>    construct is not always the "obvious" thing to do in a nested construct.
>>  - **Totality vs partiality.**  When a pattern is partial on the operand type
>>    (e.g., `case String` when the operand of switch is `Object`), it is almost
>>    never the case we want to match null (except in the case of the `null`
>>    constant pattern), whereas when a pattern is total on the operand type 
>> (e.g.,
>>    `case Object` in the same example), it is more justifiable to match null.
>>  - **Inference.**  It would be nice if a `var` pattern were simply inference 
>> for
>>    a type pattern, rather than some possibly-non-denotable union.
>> 
>> As a starting example, consider:
>> 
>> ```
>> record Box(Object o) { }
>> 
>> Box box = ...
>> switch (box) {
>>     case Box(Chocolate c):
>>     case Box(Frog f):
>>     case Box(var o):
>> }
>> ```
>> 
>> It would be highly confusing and error-prone for either of the first two
>> patterns to match `Box(null)` -- given that `Chocolate` and `Frog` have no 
>> type
>> relation, it should be perfectly safe to reorder the two.  But, because the 
>> last
>> pattern seems so obviously total on boxes, it is quite likely that what the
>> author wants is to match all remaining boxes, including those that contain 
>> null.
>> (Further, it would be terrible if there were _no_ way to say "Match any 
>> `Box`,
>> even if it contains `null`.  (While one might initially think this could be
>> repaired with OR patterns, imagine that `Box` had _n_ components -- we'd 
>> need to
>> OR together _2^n_ patterns, with complex merging, to express all the possible
>> combinations of nullity.))
>> 
>> Scala and C# took the approach of saying that "var" patterns are not just 
>> type
>> inference, they are "any" patterns -- so `Box(Object o)` matches boxes
>> containing a non-null payload, where `Box(var o)` matches all boxes.  This
>> means, unfortunately, that `var` is not mere type inference -- which 
>> complicates
>> the role of `var` in the language considerably.  Users should not have to 
>> choose
>> between the semantics they want and being explicit about types; these should 
>> be
>> orthogonal choices.  The above `switch` should be equivalent to:
>> 
>> ```
>> Box box = ...
>> switch (box) {
>>     case Box(Chocolate c):
>>     case Box(Frog f):
>>     case Box(Object o):
>> }
>> ```
>> 
>> and the choice to use `Object` or `var` should be solely one of whether the
>> manifest types are deemed to improve or impair readability.
>> 
>> #### Construct and pattern nullability
>> 
>> Currently, `instanceof` always says `false` on `null`, and `switch` always
>> throws on `null`.  Whatever null opinions a construct has, these are applied
>> before we even test any patterns.  
>> 
>> We can formalize the intuition outlined above as: type patterns that are 
>> _total_
>> on their target operand (`var x`, and `T t` on an operand of type `U`, where 
>> `U
>> <: T`) match null, and non-total type patterns do not.  (Another way to say
>> this is: a `var` pattern is the "any" pattern, and a type pattern that is  
>> total
>> on its operand type is also an "any" pattern.)  Additionally, the `null`
>> constant pattern matches null.  These are the _only_ nullable patterns.
>> 
>> In our `Box` example, this means that the last case (whether written as 
>> `Box(var
>> o)` or `Box(Object o)`) matches all boxes, including those containing null
>> (because the nested pattern is total on the nested operand), but the first 
>> two
>> cases do not.
>> 
>> If we retain the current absolute hostility of `switch` to nulls, we can't
>> trivially refactor from
>> 
>> ```
>> switch (o) {
>>     case Box(Chocolate c):
>>     case Box(Frog f):
>>     case Box(var o):
>> }
>> ```
>> to
>> 
>> ```
>> switch (o) {
>>     case Box(var contents):
>>         switch (contents) {
>>             case Chocolate c:
>>             case Frog f:
>>             case Object o:
>>         }
>>     }
>> }
>> ```
>> 
>> because the inner `switch(contents)` would NPE before we tried to match any 
>> of
>> the patterns it contains.  Instead, the user would explicitly have to do an 
>> `if
>> (contents == null)` test, and, if the intent was to handle `null` in the same
>> way as the `Object o` case, some duplication of code would be needed.  We can
>> address this sharp corner by slightly relaxing the null-hostility of 
>> `switch`,
>> as described below.
>> 
>> A similar sharp corner is the decomposition of a nested pattern `P(Q)` into
>> `P(alpha) & alpha instanceof Q`; while this is intended to be a universally
>> valid transformation, if P's 1st component might be null and Q is total,  
>> this
>> transformation would not be valid because of the existing (mild) 
>> null-hostility
>> of `instanceof`.  Again, we may be able to address this by adjusting the 
>> rules
>> surrounding `instanceof` slightly.
>> 
>> ## Generalizing switch
>> 
>> The refactoring example above motivates why we might want to relax the
>> null-handling behavior of `switch`.  On the other hand, the one thing the
>> current behavior has going for it is that at least the current behavior is 
>> easy
>> to reason about; it always throws when confronted with a `null`.  Any relaxed
>> behavior would be more complex; some switches would still have to throw (for
>> compatibility with existing semantics), and some (which can't be expressed
>> today) would accept nulls.  This is a tricky balance to achieve, but I think 
>> we
>> have a found a good one.  
>> 
>> A starting point is that we don't want to require readers to do an _O(n)_
>> analysis of each of the `case` labels just to determine whether a given 
>> switch
>> accepts `null` or not; this should be an _O(1)_ analysis.  (We do not want to
>> introduce a new flavor of `switch`, such as `switch-nullable`; this might 
>> seem
>> to fix the proximate problem but would surely create others.  As we've done 
>> with
>> expression switch and patterns, we'd rather rehabilitate `switch` than create
>> an almost-but-not-quite-the-same variant.)
>> 
>> Let's start with the null pattern, which we'll spell for sake of exposition
>> `case null`.  What if you were allowed to say `case null` in a switch, and 
>> the
>> switch would do the obvious thing?
>> 
>> ```
>> switch (o) {
>>     case null -> System.out.println("Ugh, null");
>>     case String s -> System.out.println("Yay, non-null: " + s);
>> }
>> ```
>> 
>> Given that the `case null` appears so close to the `switch`, it does not seem
>> confusing that this switch would match `null`; the existence of `case null` 
>> at
>> the top of the switch makes it pretty clear that this is intended behavior.  
>> (We
>> could further restrict the null pattern to being the first pattern in a 
>> switch,
>> to make this clearer.)
>> 
>> Now, let's look at the other end of the switch -- the last case.  What if the
>> last pattern is a total pattern?  (Note that if any `case` has a total 
>> pattern,
>> it _must_ be the last one, otherwise the cases after that would be dead, 
>> which
>> would be an error.)  Is it also reasonable for that to match null?  After 
>> all,
>> we're saying "everything":
>> 
>> ```
>> switch (o) {
>>     case String s: ...
>>     case Object o: ...
>> }
>> ```
>> 
>> Under this interpretation, the switch-refactoring anomaly above goes away.
>> 
>> The direction we're going here is that if we can localize the 
>> null-acceptance of
>> switches in the first (is it `case null`?) and last (is it total?) cases, 
>> then
>> the incremental complexity of allowing _some_ switches to accept null might 
>> be
>> outweighed by the incremental benefit of treating `null` more uniformly (and
>> thus eliminating the refactoring anomalies.)  Note also that there is no 
>> actual
>> code compatibility issue; this is all mental-model compatibility.
>> 
>> So far, we're suggesting:
>> 
>>  - A switch with a constant `null` case  will accept nulls;
>>  - If present, a constant `null` case must go first;
>>  - A switch with a total (any) case matches also accepts nulls;
>>  - If present, a total (any) case must go last.
>> 
>> #### Relocating the problem
>> 
>> It might be more helpful to view these changes as not changing the behavior 
>> of
>> `switch`, but of the `default` case of `switch`.  We can equally well 
>> interpret
>> the current behavior as:
>> 
>>  - `switch` always accepts `null`, but matching the `default` case of a 
>> `switch`
>>    throws `NullPointerException`;
>>  - any `switch` without a `default` case has an implicit do-nothing `default`
>>    case.
>> 
>> If we adopt this change of perspective, then `default`, not `switch`, is in
>> control of the null rejection behavior -- and we can view these changes as
>> adjusting the behavior of `default`.  So we can recast the proposed changes 
>> as:
>> 
>>   - Switches accept null;
>>   - A constant `null` case will match nulls, and must go first;
>>   - A total switch (a switch with a total `case`) cannot have a `default` 
>> case;
>>   - A non-total switch without a `default` case gets an implicit do-nothing
>>     `default` case;
>>   - Matching the (implicit or explicit) default case with a `null` operand
>>     always throws NPE.
>> 
>> The main casualty here is that the `default` case does not mean the same
>> thing as `case var x` or `case Object o`.  We can't deprecate `default`, but
>> for pattern switches, it becomes much less useful.  
>> 
>> #### What about method (declared) patterns?
>> 
>> So far, we've declared all patterns, except the `null` constant pattern and 
>> the
>> total (any) pattern, to not match `null`.  What about patterns that are
>> explicitly declared in code?  It turns out we can rule out these matching
>> `null` fairly easily.  
>> 
>> We can divide declared patterns into three kinds: deconstruction patterns 
>> (dual
>> to constructors), static patterns (dual to static methods), and instance
>> patterns (dual to instance methods.)  For both deconstruction and instance
>> patterns, the match target becomes the receiver; method bodies are never
>> expected to deal with the case where `this == null`.  
>> 
>> For static patterns, it is conceivable that they could match `null`, but this
>> would put a fairly serious burden on writers of static patterns to check for
>> `null` -- which they would invariably forget, and many more NPEs would ensue.
>> (Think about writing the pattern for `Optional.of(T t)` -- it would be
>> overwhelmingly likely we'd forget to check the target for nullity.)  SO there
>> is a strong argument to simply say "declared patterns never match null", to
>> not put writers of such patterns in this situation.
>> 
>> So, only the top and bottom patterns in a switch could match null; if the top
>> pattern is not `case null`, and the bottom pattern is not total, then the 
>> switch
>> throws NPE on null, otherwise it accepts null.
>> 
>> #### Adjusting instanceof
>> 
>> The remaining anomaly we had was about unrolling nested patterns when the 
>> inner
>> pattern is total.  We can plug this by simply outlawing total patterns in
>> `instanceof`.
>> 
>> This may seem like a cheap trick, but it makes sense on its own.  If the
>> following statement was allowed:
>> 
>> ```
>> if (e instanceof var x) { X }
>> ```
>> 
>> it would simply be confusing; on the one hand, it looks like it should always
>> match, but on the other, `instanceof` is historically null-hostile.  And, if 
>> the
>> pattern always matches, then the `if` statement is silly; it should be 
>> replaced
>> with:
>> 
>> ```
>> var x = e;
>> X
>> ```
>> 
>> since there's nothing conditional about it.  So by banning "any" patterns on 
>> the
>> RHS of `instanceof`, we both avoid a confusion about what is going to happen,
>> and we prevent the unrolling anomaly.
>> 
>> For reasons of compatibility, we will have to continue to allow
>> 
>> ```
>> if (e instanceof Object) { ... }   
>> ```
>> 
>> which succeeds on all non-null operands.  
>> 
>> We've been a little sloppy with the terminology of "any" vs "total"; note 
>> that
>> in
>> 
>> ```
>> Point p;
>> if (p instanceof Point(var x, var y)) { }
>> ```
>> 
>> the pattern `Point(var x, var y)` is total on `Point`, but not an "any" 
>> pattern
>> -- it still doesn't match on p == null.
>> 
>> On the theory that an "any" pattern in `instanceof` is silly, we may also 
>> want
>> to ban other "silly" patterns in `instanceof`, such as constant patterns, 
>> since
>> all of the following have simpler forms:
>> 
>> ```
>> if (x instanceof null) { ... }
>> if (x instanceof "") { ... }
>> if (i instanceof 3) { ... }
>> ```
>> 
>> In the first round (type patterns in `instanceof`), we mostly didn't confront
>> this issue, saying that `instanceof T t` matched in all the cases where
>> `instanceof T` would match.  But given that the solution for `switch` relies
>> on "any" patterns matching null, we may wish to adjust the behavior of
>> `instanceof` before it exits preview.
>> 
>> 
>> [jep305]: https://openjdk.java.net/jeps/305 
>> <https://openjdk.java.net/jeps/305>
>> [patternmatch]: pattern-match.html
>> 
>

Re: Next up for patterns: type patterns in switch

Reply via email to