GADTs -- sealed families whose permitted subtypes specialize the type variables of the base class -- pose some interesting challenges for pattern matching.

(Remi: this is a big, complex area.  Off-the-cuff "this is wrong" or "you should X instead" replies are not helpful.  If in doubt, ask questions.  One comprehensive reply is more useful than many small replies.  Probably best to think about the whole thing for some time before responding.)

Here is an example of a GADT hiearchy:

sealed interface Node<T> { }

record IntNode(int i) implements Node<Integer> { }
record BoolNode(boolean b) implements Node<Boolean> { }
record PlusNode(Node<Integer> a, Node<Integer> b) implements Node<Integer> { }
record OrNode(Node<Boolean> a, Node<Boolean> b) implements Node<Boolean> { }
record IfNode<T>(Node<Boolean> cond, Node<T> a, Node<T> b) implements Node<T> { }

Nodes can be parameterized, but some nodes are sharply typed, and some intermediate nodes (plus, or, if) have constraints on their components.  This is enough to model expressions like:

    let
       a = true, b = false, x = 1, y = 2
       in
           if (a || b) then a + b else b;

Note that `if` nodes can work on both Node<Integer> and Node<Boolean>, and model a node of the right type.

## The Flow Problem

As mentioned earlier, pattern matching can recover constraints on type variables, but currently we do not act on these.  For example, we might want to write the eval() like this:

static<T> T eval(Node<T> n) {
    return switch (n) {
        case IntNode(var i) -> i;
        case BoolNode(var b) -> b;
        case PlusNode(var a, var b) -> eval(a) + eval(b);
        case OrNode(var a, var b) -> eval(a) || eval(b);
        case IfNode<T>(var c, var a, var b) -> eval(c) ? eval(a) : eval(b);
    };

But this doesn't work.  The eval() method returns a T.  In the first case, we've matched Node<T> to IntNode, so the compiler knows `i : int`.  Humans know that T can only be Integer, but the compiler doesn't know that yet.  As a result, the choice to return `i` will cause a type error; the compiler has no reason to believe that `i` is a `T`.  The only choice the user has is an unchecked cast to `T`.  This isn't great.

We've discussed, as a possible solution, flow typing for type variables; matching IntNode to Node<T> can generate a constraint T=Integer in the scope where the pattern matches. Pattern matching is already an explicitly conditional construct; whether a pattern matches already flows into scoping and control flow analysis.  Refining type constraints on type variables is a reasonable thing to consider, and offers a greater type-safety payoff than ordinary flow typing (since most flow typing can be replaced with pattern matching.)

We have the same problem with the PlusNode and OrNode cases too; if we match PlusNode, then T can only be Integer, but the RHS will be int and assigning an int to a T will cause a problem.  Only the last case will type check without gathering extra T constraints.

## The Exhaustiveness Problem

Now suppose we have a Node<Integer>.  Then it can only be an IntNode, a PlusNode, or an IfNode.  So the following switch should be exhaustive:

static int eval(Node<Integer> n) {
    return switch (n) {
        case IntNode(var i) -> i;
        case PlusNode(var a, var b) -> eval(a) + eval(b);
        case IfNode<Integer>(var c, var a, var b) -> eval(c) ? eval(a) : eval(b);
    };

We need to be able to eliminate BoolNode and OrNode from the list of types that have to be covered by the switch.

We're proposing changes in the current round (also covered in my Coverage doc) that refines the "you cover a sealed type if you cover all the permitted subtypes" rule to exclude those whose parameterization are impossible.

## The Typing Problem

Even without worrying about the RHS, we have problems with cases like this:

static<T> T eval(Node<T> n) {
    return switch (n) {
        ...
        case IfNode<Integer>(var c, IntNode a, IntNode b) -> eval(c) ? a.i() + b.i(); // optimization
    };

We know that an IfNode must have the same node parameterization on both a and b.  We don't encourage raw IfNode here; there should be something in the <brackets>.  The rule is that if a type / record pattern is generic, the parameterization must be statically consistent with the target type; there has to be a cast conversion without unchecked conversion.  (This can get refined if we get sufficiently useful constraints from somewhere else, but not relaxed.)  But without some inference, we can't yet conclude that Integer is a valid (i.e., won't require unchecked conversion) parameterization for Node<T>.  But clearly, Integer is the only possibility here.  So we can't even write this -- we'd have to use a raw or wildcard case, which is not very good.  We need more inference here, so we have enough type information for better well-formedness checks.

#### Putting it all together

Here's a related example from the "Lower your Guards" paper which ties it all together.  In Haskell:

data T a b where
T1 :: T Int Bool
T2 :: T Char Bool

g1 :: T Int b -> b -> Int
g1 T1 False = 0
g1 T1 True = 1

Translating to Java:

sealed interface T<A,B> { }
record T1() implements T<int, bool> { }
record T2() implements T<char, bool> { }

record G1<B>(T<int, B> t, B b) { }

B bb = switch (g) {
case G1<bool>(T1 t, false) -> 0;
case G1<bool>(T1 t, true) -> 1;
}


The above switch is exhaustive on G1<B>, but a lot has to happen to type-check the above switch:

 - We need to gather the constraint of B=int in both cases, from the nested type pattern T1 t.  - We need to flow B=int into the body, so that assignment of int to bb can proceed.  - We need to type-check that the generics in the cases is compatible with the target
 - (bonus) infer T=bool if not specified explicitly
 - We need to conclude that these two cases are exhaustive on G<B>, which involves observing that T2 is not allowable here, so T1 covers T, and therefore (T1, false) and (T1, true) cover T x bool.


I hope these examples illustrate that these cases are not silly, made-up cases; the hierarchy above is a sensible way to model expressions.  We are not going to solve all these problems immediately, but we need a story for getting there; otherwise we leave users with some bad choices, most of which involve raw types or unchecked casts.

## A shopping list

So, we have quite a shopping list for type checking patterns. Some show up immediately as soon as we have record patterns; others will come later as we add explicit dtors.

#### Well formedness checking

Records can be generic, and we can nest patterns in record patterns.  Record patterns need a well-formedness check to ensure that any nested patterns are sound.  In the absence of inference, this is mostly a straightforward recursive application of the cast conversion test.

We currently interpret a type pattern:

    case ArrayList a:

as a raw ArrayList; I think this may be a mistake for type patterns, but it surely will be a mistake for record patterns.  If we interpret:

    case Box(...)

as a raw box, we lose useful type information with which to gauge exhaustiveness.  In any case, if we have generic type parameters:

    case Box<String>(...)

we can use the type parameters to refine the component types, which affects the recursive well-formedness check.

#### Inference

I think we should be using inference routinely on record/dtor patterns, and we should also consider doing the same on type patterns.  So this means interpreting

    case Box(...)

as "implicit diamond".  (We can talk about implicit vs explicit diamond, but that's mostly a syntax distraction, and you know the rules: no syntax discussions until there is consensus on semantics.)

Inference in this case is different from what we do for methods (like everything else: well formedness, overload selection) because we're doing everything in reverse.  I'll sketch out an inference algorithm later, but a key point is that we have to solve nested generic patterns together with the outer pattern (as we do with methods.)  This then feeds more type information into the well formedness check, which is how we'd type check the IfNode<Integer> case or the G1<bool> case.

Inference on type variables that flow in from the switch target can be flowed into the case bodies.

#### Exhaustiveness

I think what we've identified for exhaustiveness is enough.

#### Overload selection

When we add explicit dtor patterns, now we need to do overload selection.  This is like method overload selection, but again key parts run in reverse.

When we add in primitive patterns with conversions, or varargs patterns, overload selection will need to reflect the same multi-phase applicability checking that we have for method overload selection.  Details to come at the appropriate time.



Reply via email to