[Python-Dev] Another take on PEP 622

Tobias Kohn Thu, 16 Jul 2020 19:00:24 -0700

Hi Everyone,

I feel there are still quite a few misconceptions around concerningPEP 622 and the new pattern matching feature it proposes. Pleaseallow me therefore to take another attempt at explaining the ideasbehind PEP 622 with a different approach. Bear in mind that Inaturally cannot cover everything, though, and that some partspresented here are still slightly simplified.



Let's start with perhaps the most crucial part:

PEP 622 does **NOT** propose to introduce a `switch`-statement!

Indeed, pattern matching is much more closely related to concepts likeregular expressions, sequence unpacking, visitor patterns, or functionoverloading. In particular, the patterns themselves share moresimilarities with formal parameters than with expressions.

So, to start with I would like to invite you to think of PEP 622 notso much as proposing a new control structure in the sense of`if`-`elif`-`else`, `try`-`except`-`finally`, or even `switch`, butmuch rather as addressing the question: what would functionoverloading look like in Python? Of course, this does not fullycapture pattern matching or do it justice, either, but it might offera better basis to start with.




1. Function Overloading
-----------------------

In a statically typed language, you might define a function `product`that either accepts two numbers or an iterable something like thefollowing (I am sticking to Python syntax for simplicity):

```
def product(a: float, b: float) -> float:
    return a * b

def product(it: Iterable[float]) -> float:
    result = 1.0
    for x in it: result *= x
    return result
```

In Python, however, this needs to be done differently and the dispatchlogic has to be put inside the function itself:

```
def product(*args):

if len(args) == 2 and isinstance(args[0], float) andisinstance(args[1], float):

        return args[0] * args[1]
    elif len(args) == 1 and isinstance(args[0], Iterable):
        result = 1.0
        for x in args[0]: result *= x
        return result
```

It is this use case to begin with that pattern matching is addressingby introducing a more declarative way. Each `case` represents onepossibility of what the parameter list might look like:

```
def product(*args):
    match args:
        case (float(a), float(b)):
            return a * b
        case (Iterable(it),):
            result = 1.0
            for x in it: result *= x
            return result
```

And if you squint a little, you might even see that these parameterlists could almost be written in C: `(float a, float b)`.

In the context of more functional languages, you might also have seensome wilder stuff, where function overloading allows you to includeliterals. One of the most prevalent examples is the factorialfunction, usually defined a bit like this:

```
def fact(0):
    return 1
def fact(n):
    return n * fact(n-1)
```

Again, when doing the same thing in Python, we put the dispatch logicinside the function:

```
def fact(n):
    if n == 0:
        return 1
    else:
        return n * fact(n - 1)
```
It is only natural to also allow pattern matching to express this use case:
```
def fact(arg):
    match arg:
        case 0:
            return 1
        case n:
            return n * fact(n - 1)
```

And this here is where you probably start to see good old `switch`coming in. Indeed, pattern matching is powerful and versatile enoughto act as a `switch` replacement in many cases. But given how wearrived here, you might start to understand that this is a happyaccident and not by design of the structure itself.

There is one big elephant in the room, of course: why do we need the`match`-line in the first place? If all we want is to somehow expressfunction overloading, couldn't we just put the cases directly into thefunction body?

In principle, yes, we could do that. But! As it turns out, there issomething to be gained by a clear separation. When actually usingpattern matching, you might discover that you do not always want todefine a function. Sometimes, it is quite convenient being able tohave a pattern like this, for instance:

```
def foo(*args):
    for arg in args:
        match arg:
            case ...
```

In all of these examples, it looks as if pattern matching is replacingan `if`-`elif`-`else` chain. However, this is not entirely accurate. What we really want to express is the function overloading in thefirst place. The `if`s are only needed to express the same idea inPython for the time being. In other words: the individual cases hereexpress independent implementations and we leave it up to the Pythoninterpreter to choose the right one.




2. Visitor Pattern and Dispatch
-------------------------------

If you wanted to implement something like function overloading basedon the type of the argument, you might use the _visitor pattern_ forthat purpose. Thanks to Python's reflection capabilities, this isquite easy and simple to do. Take, e.g., this example of a function`draw()` that accepts a range of different geometric shapes.

```
def draw(geoShape):
    class Matcher:
        def case_Point(self, x, y):
            canvas.moveTo(x, y)
            canvas.dot(1)

        def case_Line(self, x1, y1, x2, y2):
            canvas.moveTo(x1, y1)
            canvas.lineTo(x2, y2)

        def case_Circle(self, cX, cY, radius):
            canvas.ellipse(cX-radius, cY-radius, cX+radius, cY+radius)

        def match(self, subject):
            n = 'case_' + subject.__class__.__name__
            method = getattr(self, n)
            args = get_attribute_values(subject)
            method(*args)

    m = Matcher()
    m.match(geoShape)
```

Now let us look at the same thing written as pattern matchingaccording to PEP 622:

```
def draw(geoShape):
    match geoShape:
        case Point(x, y):
            canvas.moveTo(x, y)
            canvas.dot(1)

        case Line(x1, y1, x2, y2):
            canvas.moveTo(x1, y1)
            canvas.lineTo(x2, y2)

        case Circle(cX, cY, radius):
            canvas.ellipse(cX-radius, cY-radius, cX+radius, cY+radius)    
```

You might see that the design of pattern matching is not somethingentirely new and alien in Python, but well founded in ideas andprinciples already present. Moreover, observe, once again, how thevariables in the patterns literally correspond to parameters in thevisitor pattern above. In this instance, the `case` statements arequite close to function definitions in their entire syntacticstructure (is it really that alien that `Point(x, y)` here does _not_represent a function call?).

This example also demonstrates something else I have brought upbefore: pattern matching does not strictly have the same semantics asan `if`-`elif`-`else` chain. It is thus not a simple control flowconstruct, as the emphasis is less on the control flow itself and moreon the shape of the objects and data.

There are, however, some clear limitations of the visitor pattern asimplement in this way. If we go and define a new subclass of one ofthese geometric shapes---perhaps something like `HorizontalLine` as aspecialisation of `Line`---, the visitor pattern will break downbecause there is no method `case_HorizontalLine`. Pattern matching,on the other hand, will still work as expected, because we areactually working in the space of classes with their full inheritancestructure, and not only on the names of types.

Mind, pattern matching is not a replacement for the visitor pattern ingeneral, though. Even with pattern matching in place, there are validuse cases where you would certainly prefer the visitor pattern overpattern matching. This is only one example, after all, meant toillustrate the idea behind pattern matching and why it might be a goodidea.




3. Shape and Structure
----------------------

Data is organised and comes in different shapes, structures, andrepresentations. It is not always as easy as in the cases above,where the class of an object already contains all the information youneed. As a first example, the visitor pattern as illustrated abovecould not easily differentiate between tuples of different lengths,say: you would have to encode the length of the tuple somehow in themethod names, and then consider what happens if some objects does nothave a length, say, and thus requires a different encoding scheme.

In fact, the classic example of where pattern matching really startsto shine is a tree structure that we want to simplify (or otherwiseprocess) according to specific rules. This might be an abstractsyntax tree (AST) or a mathematical expression, but the principle isalways the same: the decision which rule to apply depends on a numberof different attributes, which are often scattered among nodes ofdifferent depths in your tree. The following illustrates this with atree that represents mathematical expressions:

```
def simplify(node):
    match node:
        case BinOp(Num(left), '+', Num(right)):
            return Num(left + right)
        case BinOp(left, ('+'|'-'), Num(0)):
            return left
        case UnaryOp('-', UnaryOp('-', x)):
            return x
        case UnaryOp('-', Num(x)):
            return Num(-x)
        case anythingElse:
            return anythingElse
```

If you draw an actual diagram of the structure that each case imposeson the node, you will quickly find that pattern matching allows you toexpress a larger variety of structure than you could encode in methodnames. Not to mention if you had to write everything with `if`s andcalls to `isinstance`, etc. It is possible, of course, but patternmatching introduces a very succinct way of expressing the structuredeclaratively and then let the compiler and interpreter figure out howto best compare the subject `node` against the different possibilitiesand extract the data you are actually interested in.

By the way: in this case, the compiler might end up creating astructure of nested `if`s, perhaps something like:

```
def simplify(node):
    TOS = node
    try:
        if isinstance(TOS, BinOp) and isinstance(TOS.right, Num):
            if TOS.right.n == 0 and TOS.op in ('+','-'):
                return TOS.left
            elif TOS.op == '+' and isinstance(TOS.left, Num):
                return Num(TOS.left.n + TOS.right.n)
        elif isinstance(TOS, UnaryOp) and TOS.op == '-':
            value = TOS.operand
            if isinstance(value, UnaryOp) and value.op == '-':
                return value.operand
            elif isinstance(value, Num):
                return value.n
        return TOS:
    finally:
        del TOS
```

This illustrates, once again, that pattern matching is not so muchspecifying the control flow itself, but rather offering a set ofalternatives. But there is neither the strict sequential structure of`if`-`elif`-`else`, not the jump tables of `switch`. And in case youwondered: the entire `match` block is actually a mini-scope, in whichthe subject `node` is defined.



Kind regards,
Tobias

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZAMFDQKKEJONDEQRGC5XKOUAFUTL6OG7/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Another take on PEP 622

Reply via email to