Patterns and GADTs (and type checking and inference and overload selection)

Brian Goetz Wed, 30 Mar 2022 12:33:52 -0700

GADTs -- sealed families whose permitted subtypes specialize the typevariables of the base class -- pose some interesting challenges forpattern matching.

(Remi: this is a big, complex area. Off-the-cuff "this is wrong" or"you should X instead" replies are not helpful. If in doubt, askquestions. One comprehensive reply is more useful than many smallreplies. Probably best to think about the whole thing for some timebefore responding.)


Here is an example of a GADT hiearchy:

sealed interface Node<T> { }

record IntNode(int i) implements Node<Integer> { }
record BoolNode(boolean b) implements Node<Boolean> { }

record PlusNode(Node<Integer> a, Node<Integer> b) implementsNode<Integer> { }

record OrNode(Node<Boolean> a, Node<Boolean> b) implements Node<Boolean> { }

record IfNode<T>(Node<Boolean> cond, Node<T> a, Node<T> b) implementsNode<T> { }

Nodes can be parameterized, but some nodes are sharply typed, and someintermediate nodes (plus, or, if) have constraints on their components. This is enough to model expressions like:


    let
       a = true, b = false, x = 1, y = 2
       in
           if (a || b) then a + b else b;

Note that `if` nodes can work on both Node<Integer> and Node<Boolean>,and model a node of the right type.


## The Flow Problem

As mentioned earlier, pattern matching can recover constraints on typevariables, but currently we do not act on these. For example, we mightwant to write the eval() like this:


static<T> T eval(Node<T> n) {
    return switch (n) {
        case IntNode(var i) -> i;
        case BoolNode(var b) -> b;
        case PlusNode(var a, var b) -> eval(a) + eval(b);
        case OrNode(var a, var b) -> eval(a) || eval(b);
        case IfNode<T>(var c, var a, var b) -> eval(c) ? eval(a) : eval(b);
    };

But this doesn't work. The eval() method returns a T. In the firstcase, we've matched Node<T> to IntNode, so the compiler knows `i :int`. Humans know that T can only be Integer, but the compiler doesn'tknow that yet. As a result, the choice to return `i` will cause a typeerror; the compiler has no reason to believe that `i` is a `T`. Theonly choice the user has is an unchecked cast to `T`. This isn't great.

We've discussed, as a possible solution, flow typing for type variables;matching IntNode to Node<T> can generate a constraint T=Integer in thescope where the pattern matches. Pattern matching is already anexplicitly conditional construct; whether a pattern matches alreadyflows into scoping and control flow analysis. Refining type constraintson type variables is a reasonable thing to consider, and offers agreater type-safety payoff than ordinary flow typing (since most flowtyping can be replaced with pattern matching.)

We have the same problem with the PlusNode and OrNode cases too; if wematch PlusNode, then T can only be Integer, but the RHS will be int andassigning an int to a T will cause a problem. Only the last case willtype check without gathering extra T constraints.


## The Exhaustiveness Problem

Now suppose we have a Node<Integer>. Then it can only be an IntNode, aPlusNode, or an IfNode. So the following switch should be exhaustive:


static int eval(Node<Integer> n) {
    return switch (n) {
        case IntNode(var i) -> i;
        case PlusNode(var a, var b) -> eval(a) + eval(b);

case IfNode<Integer>(var c, var a, var b) -> eval(c) ? eval(a): eval(b);

};

We need to be able to eliminate BoolNode and OrNode from the list oftypes that have to be covered by the switch.

We're proposing changes in the current round (also covered in myCoverage doc) that refines the "you cover a sealed type if you cover allthe permitted subtypes" rule to exclude those whose parameterization areimpossible.


## The Typing Problem

Even without worrying about the RHS, we have problems with cases like this:

static<T> T eval(Node<T> n) {
    return switch (n) {
        ...

case IfNode<Integer>(var c, IntNode a, IntNode b) -> eval(c) ?a.i() + b.i(); // optimization

};

We know that an IfNode must have the same node parameterization on botha and b. We don't encourage raw IfNode here; there should be somethingin the <brackets>. The rule is that if a type / record pattern isgeneric, the parameterization must be statically consistent with thetarget type; there has to be a cast conversion without uncheckedconversion. (This can get refined if we get sufficiently usefulconstraints from somewhere else, but not relaxed.) But without someinference, we can't yet conclude that Integer is a valid (i.e., won'trequire unchecked conversion) parameterization for Node<T>. Butclearly, Integer is the only possibility here. So we can't even writethis -- we'd have to use a raw or wildcard case, which is not verygood. We need more inference here, so we have enough type informationfor better well-formedness checks.


#### Putting it all together

Here's a related example from the "Lower your Guards" paper which tiesit all together. In Haskell:


data T a b where
T1 :: T Int Bool
T2 :: T Char Bool

g1 :: T Int b -> b -> Int
g1 T1 False = 0
g1 T1 True = 1

Translating to Java:

sealed interface T<A,B> { }
record T1() implements T<int, bool> { }
record T2() implements T<char, bool> { }

record G1<B>(T<int, B> t, B b) { }

B bb = switch (g) {
case G1<bool>(T1 t, false) -> 0;
case G1<bool>(T1 t, true) -> 1;
}

The above switch is exhaustive on G1<B>, but a lot has to happen totype-check the above switch:

- We need to gather the constraint of B=int in both cases, from thenested type pattern T1 t. - We need to flow B=int into the body, so that assignment of int to bbcan proceed. - We need to type-check that the generics in the cases is compatiblewith the target

 - (bonus) infer T=bool if not specified explicitly

- We need to conclude that these two cases are exhaustive on G<B>,which involves observing that T2 is not allowable here, so T1 covers T,and therefore (T1, false) and (T1, true) cover T x bool.

I hope these examples illustrate that these cases are not silly, made-upcases; the hierarchy above is a sensible way to model expressions. Weare not going to solve all these problems immediately, but we need astory for getting there; otherwise we leave users with some bad choices,most of which involve raw types or unchecked casts.


## A shopping list

So, we have quite a shopping list for type checking patterns. Some showup immediately as soon as we have record patterns; others will comelater as we add explicit dtors.


#### Well formedness checking

Records can be generic, and we can nest patterns in record patterns. Record patterns need a well-formedness check to ensure that any nestedpatterns are sound. In the absence of inference, this is mostly astraightforward recursive application of the cast conversion test.


We currently interpret a type pattern:

    case ArrayList a:

as a raw ArrayList; I think this may be a mistake for type patterns, butit surely will be a mistake for record patterns. If we interpret:


    case Box(...)

as a raw box, we lose useful type information with which to gaugeexhaustiveness. In any case, if we have generic type parameters:


    case Box<String>(...)

we can use the type parameters to refine the component types, whichaffects the recursive well-formedness check.


#### Inference

I think we should be using inference routinely on record/dtor patterns,and we should also consider doing the same on type patterns. So thismeans interpreting


    case Box(...)

as "implicit diamond". (We can talk about implicit vs explicit diamond,but that's mostly a syntax distraction, and you know the rules: nosyntax discussions until there is consensus on semantics.)

Inference in this case is different from what we do for methods (likeeverything else: well formedness, overload selection) because we'redoing everything in reverse. I'll sketch out an inference algorithmlater, but a key point is that we have to solve nested generic patternstogether with the outer pattern (as we do with methods.) This thenfeeds more type information into the well formedness check, which is howwe'd type check the IfNode<Integer> case or the G1<bool> case.

Inference on type variables that flow in from the switch target can beflowed into the case bodies.


#### Exhaustiveness

I think what we've identified for exhaustiveness is enough.

#### Overload selection

When we add explicit dtor patterns, now we need to do overloadselection. This is like method overload selection, but again key partsrun in reverse.

When we add in primitive patterns with conversions, or varargs patterns,overload selection will need to reflect the same multi-phaseapplicability checking that we have for method overload selection. Details to come at the appropriate time.

Patterns and GADTs (and type checking and inference and overload selection)

Reply via email to