Hi Everyone,

I feel there are still quite a few misconceptions around concerning PEP 622 and the new pattern matching feature it proposes.  Please allow me therefore to take another attempt at explaining the ideas behind PEP 622 with a different approach.  Bear in mind that I naturally cannot cover everything, though, and that some parts presented here are still slightly simplified.


Let's start with perhaps the most crucial part:

PEP 622 does **NOT** propose to introduce a `switch`-statement!

Indeed, pattern matching is much more closely related to concepts like regular expressions, sequence unpacking, visitor patterns, or function overloading.  In particular, the patterns themselves share more similarities with formal parameters than with expressions.

So, to start with I would like to invite you to think of PEP 622 not so much as proposing a new control structure in the sense of `if`-`elif`-`else`, `try`-`except`-`finally`, or even `switch`, but much rather as addressing the question: what would function overloading look like in Python?  Of course, this does not fully capture pattern matching or do it justice, either, but it might offer a better basis to start with.



1. Function Overloading
-----------------------

In a statically typed language, you might define a function `product` that either accepts two numbers or an iterable something like the following (I am sticking to Python syntax for simplicity):
```
def product(a: float, b: float) -> float:
    return a * b

def product(it: Iterable[float]) -> float:
    result = 1.0
    for x in it: result *= x
    return result
```
In Python, however, this needs to be done differently and the dispatch logic has to be put inside the function itself:
```
def product(*args):
    if len(args) == 2 and isinstance(args[0], float) and isinstance(args[1], float):
        return args[0] * args[1]
    elif len(args) == 1 and isinstance(args[0], Iterable):
        result = 1.0
        for x in args[0]: result *= x
        return result
```
It is this use case to begin with that pattern matching is addressing by introducing a more declarative way.  Each `case` represents one possibility of what the parameter list might look like:
```
def product(*args):
    match args:
        case (float(a), float(b)):
            return a * b
        case (Iterable(it),):
            result = 1.0
            for x in it: result *= x
            return result
```
And if you squint a little, you might even see that these parameter lists could almost be written in C: `(float a, float b)`.

In the context of more functional languages, you might also have seen some wilder stuff, where function overloading allows you to include literals.  One of the most prevalent examples is the factorial function, usually defined a bit like this:
```
def fact(0):
    return 1
def fact(n):
    return n * fact(n-1)
```
Again, when doing the same thing in Python, we put the dispatch logic inside the function:
```
def fact(n):
    if n == 0:
        return 1
    else:
        return n * fact(n - 1)
```
It is only natural to also allow pattern matching to express this use case:
```
def fact(arg):
    match arg:
        case 0:
            return 1
        case n:
            return n * fact(n - 1)
```
And this here is where you probably start to see good old `switch` coming in.  Indeed, pattern matching is powerful and versatile enough to act as a `switch` replacement in many cases.  But given how we arrived here, you might start to understand that this is a happy accident and not by design of the structure itself.

There is one big elephant in the room, of course: why do we need the `match`-line in the first place?  If all we want is to somehow express function overloading, couldn't we just put the cases directly into the function body?

In principle, yes, we could do that.  But!  As it turns out, there is something to be gained by a clear separation.  When actually using pattern matching, you might discover that you do not always want to define a function.  Sometimes, it is quite convenient being able to have a pattern like this, for instance:
```
def foo(*args):
    for arg in args:
        match arg:
            case ...
```

In all of these examples, it looks as if pattern matching is replacing an `if`-`elif`-`else` chain.  However, this is not entirely accurate.  What we really want to express is the function overloading in the first place.  The `if`s are only needed to express the same idea in Python for the time being.  In other words: the individual cases here express independent implementations and we leave it up to the Python interpreter to choose the right one.



2. Visitor Pattern and Dispatch
-------------------------------

If you wanted to implement something like function overloading based on the type of the argument, you might use the _visitor pattern_ for that purpose.  Thanks to Python's reflection capabilities, this is quite easy and simple to do.  Take, e.g., this example of a function `draw()` that accepts a range of different geometric shapes.
```
def draw(geoShape):
    class Matcher:
        def case_Point(self, x, y):
            canvas.moveTo(x, y)
            canvas.dot(1)

        def case_Line(self, x1, y1, x2, y2):
            canvas.moveTo(x1, y1)
            canvas.lineTo(x2, y2)

        def case_Circle(self, cX, cY, radius):
            canvas.ellipse(cX-radius, cY-radius, cX+radius, cY+radius)

        def match(self, subject):
            n = 'case_' + subject.__class__.__name__
            method = getattr(self, n)
            args = get_attribute_values(subject)
            method(*args)

    m = Matcher()
    m.match(geoShape)
```
Now let us look at the same thing written as pattern matching according to PEP 622:
```
def draw(geoShape):
    match geoShape:
        case Point(x, y):
            canvas.moveTo(x, y)
            canvas.dot(1)

        case Line(x1, y1, x2, y2):
            canvas.moveTo(x1, y1)
            canvas.lineTo(x2, y2)

        case Circle(cX, cY, radius):
            canvas.ellipse(cX-radius, cY-radius, cX+radius, cY+radius)    
```
You might see that the design of pattern matching is not something entirely new and alien in Python, but well founded in ideas and principles already present.  Moreover, observe, once again, how the variables in the patterns literally correspond to parameters in the visitor pattern above.  In this instance, the `case` statements are quite close to function definitions in their entire syntactic structure (is it really that alien that `Point(x, y)` here does _not_ represent a function call?).

This example also demonstrates something else I have brought up before: pattern matching does not strictly have the same semantics as an `if`-`elif`-`else` chain.  It is thus not a simple control flow construct, as the emphasis is less on the control flow itself and more on the shape of the objects and data.

There are, however, some clear limitations of the visitor pattern as implement in this way.  If we go and define a new subclass of one of these geometric shapes---perhaps something like `HorizontalLine` as a specialisation of `Line`---, the visitor pattern will break down because there is no method `case_HorizontalLine`.  Pattern matching, on the other hand, will still work as expected, because we are actually working in the space of classes with their full inheritance structure, and not only on the names of types.

Mind, pattern matching is not a replacement for the visitor pattern in general, though.  Even with pattern matching in place, there are valid use cases where you would certainly prefer the visitor pattern over pattern matching.  This is only one example, after all, meant to illustrate the idea behind pattern matching and why it might be a good idea.



3. Shape and Structure
----------------------

Data is organised and comes in different shapes, structures, and representations.  It is not always as easy as in the cases above, where the class of an object already contains all the information you need.  As a first example, the visitor pattern as illustrated above could not easily differentiate between tuples of different lengths, say: you would have to encode the length of the tuple somehow in the method names, and then consider what happens if some objects does not have a length, say, and thus requires a different encoding scheme.

In fact, the classic example of where pattern matching really starts to shine is a tree structure that we want to simplify (or otherwise process) according to specific rules.  This might be an abstract syntax tree (AST) or a mathematical expression, but the principle is always the same: the decision which rule to apply depends on a number of different attributes, which are often scattered among nodes of different depths in your tree.  The following illustrates this with a tree that represents mathematical expressions:
```
def simplify(node):
    match node:
        case BinOp(Num(left), '+', Num(right)):
            return Num(left + right)
        case BinOp(left, ('+'|'-'), Num(0)):
            return left
        case UnaryOp('-', UnaryOp('-', x)):
            return x
        case UnaryOp('-', Num(x)):
            return Num(-x)
        case anythingElse:
            return anythingElse
```
If you draw an actual diagram of the structure that each case imposes on the node, you will quickly find that pattern matching allows you to express a larger variety of structure than you could encode in method names.  Not to mention if you had to write everything with `if`s and calls to `isinstance`, etc.  It is possible, of course, but pattern matching introduces a very succinct way of expressing the structure declaratively and then let the compiler and interpreter figure out how to best compare the subject `node` against the different possibilities and extract the data you are actually interested in.

By the way: in this case, the compiler might end up creating a structure of nested `if`s, perhaps something like:
```
def simplify(node):
    TOS = node
    try:
        if isinstance(TOS, BinOp) and isinstance(TOS.right, Num):
            if TOS.right.n == 0 and TOS.op in ('+','-'):
                return TOS.left
            elif TOS.op == '+' and isinstance(TOS.left, Num):
                return Num(TOS.left.n + TOS.right.n)
        elif isinstance(TOS, UnaryOp) and TOS.op == '-':
            value = TOS.operand
            if isinstance(value, UnaryOp) and value.op == '-':
                return value.operand
            elif isinstance(value, Num):
                return value.n
        return TOS:
    finally:
        del TOS
```
This illustrates, once again, that pattern matching is not so much specifying the control flow itself, but rather offering a set of alternatives.  But there is neither the strict sequential structure of `if`-`elif`-`else`, not the jump tables of `switch`.  And in case you wondered: the entire `match` block is actually a mini-scope, in which the subject `node` is defined.


Kind regards,
Tobias

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZAMFDQKKEJONDEQRGC5XKOUAFUTL6OG7/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to