Hi Everyone,
I feel there are still quite a few misconceptions around concerning
PEP 622 and the new pattern matching feature it proposes. Please
allow me therefore to take another attempt at explaining the ideas
behind PEP 622 with a different approach. Bear in mind that I
naturally cannot cover everything, though, and that some parts
presented here are still slightly simplified.
Let's start with perhaps the most crucial part:
PEP 622 does **NOT** propose to introduce a `switch`-statement!
Indeed, pattern matching is much more closely related to concepts like
regular expressions, sequence unpacking, visitor patterns, or function
overloading. In particular, the patterns themselves share more
similarities with formal parameters than with expressions.
So, to start with I would like to invite you to think of PEP 622 not
so much as proposing a new control structure in the sense of
`if`-`elif`-`else`, `try`-`except`-`finally`, or even `switch`, but
much rather as addressing the question: what would function
overloading look like in Python? Of course, this does not fully
capture pattern matching or do it justice, either, but it might offer
a better basis to start with.
1. Function Overloading
-----------------------
In a statically typed language, you might define a function `product`
that either accepts two numbers or an iterable something like the
following (I am sticking to Python syntax for simplicity):
```
def product(a: float, b: float) -> float:
return a * b
def product(it: Iterable[float]) -> float:
result = 1.0
for x in it: result *= x
return result
```
In Python, however, this needs to be done differently and the dispatch
logic has to be put inside the function itself:
```
def product(*args):
if len(args) == 2 and isinstance(args[0], float) and
isinstance(args[1], float):
return args[0] * args[1]
elif len(args) == 1 and isinstance(args[0], Iterable):
result = 1.0
for x in args[0]: result *= x
return result
```
It is this use case to begin with that pattern matching is addressing
by introducing a more declarative way. Each `case` represents one
possibility of what the parameter list might look like:
```
def product(*args):
match args:
case (float(a), float(b)):
return a * b
case (Iterable(it),):
result = 1.0
for x in it: result *= x
return result
```
And if you squint a little, you might even see that these parameter
lists could almost be written in C: `(float a, float b)`.
In the context of more functional languages, you might also have seen
some wilder stuff, where function overloading allows you to include
literals. One of the most prevalent examples is the factorial
function, usually defined a bit like this:
```
def fact(0):
return 1
def fact(n):
return n * fact(n-1)
```
Again, when doing the same thing in Python, we put the dispatch logic
inside the function:
```
def fact(n):
if n == 0:
return 1
else:
return n * fact(n - 1)
```
It is only natural to also allow pattern matching to express this use case:
```
def fact(arg):
match arg:
case 0:
return 1
case n:
return n * fact(n - 1)
```
And this here is where you probably start to see good old `switch`
coming in. Indeed, pattern matching is powerful and versatile enough
to act as a `switch` replacement in many cases. But given how we
arrived here, you might start to understand that this is a happy
accident and not by design of the structure itself.
There is one big elephant in the room, of course: why do we need the
`match`-line in the first place? If all we want is to somehow express
function overloading, couldn't we just put the cases directly into the
function body?
In principle, yes, we could do that. But! As it turns out, there is
something to be gained by a clear separation. When actually using
pattern matching, you might discover that you do not always want to
define a function. Sometimes, it is quite convenient being able to
have a pattern like this, for instance:
```
def foo(*args):
for arg in args:
match arg:
case ...
```
In all of these examples, it looks as if pattern matching is replacing
an `if`-`elif`-`else` chain. However, this is not entirely accurate.
What we really want to express is the function overloading in the
first place. The `if`s are only needed to express the same idea in
Python for the time being. In other words: the individual cases here
express independent implementations and we leave it up to the Python
interpreter to choose the right one.
2. Visitor Pattern and Dispatch
-------------------------------
If you wanted to implement something like function overloading based
on the type of the argument, you might use the _visitor pattern_ for
that purpose. Thanks to Python's reflection capabilities, this is
quite easy and simple to do. Take, e.g., this example of a function
`draw()` that accepts a range of different geometric shapes.
```
def draw(geoShape):
class Matcher:
def case_Point(self, x, y):
canvas.moveTo(x, y)
canvas.dot(1)
def case_Line(self, x1, y1, x2, y2):
canvas.moveTo(x1, y1)
canvas.lineTo(x2, y2)
def case_Circle(self, cX, cY, radius):
canvas.ellipse(cX-radius, cY-radius, cX+radius, cY+radius)
def match(self, subject):
n = 'case_' + subject.__class__.__name__
method = getattr(self, n)
args = get_attribute_values(subject)
method(*args)
m = Matcher()
m.match(geoShape)
```
Now let us look at the same thing written as pattern matching
according to PEP 622:
```
def draw(geoShape):
match geoShape:
case Point(x, y):
canvas.moveTo(x, y)
canvas.dot(1)
case Line(x1, y1, x2, y2):
canvas.moveTo(x1, y1)
canvas.lineTo(x2, y2)
case Circle(cX, cY, radius):
canvas.ellipse(cX-radius, cY-radius, cX+radius, cY+radius)
```
You might see that the design of pattern matching is not something
entirely new and alien in Python, but well founded in ideas and
principles already present. Moreover, observe, once again, how the
variables in the patterns literally correspond to parameters in the
visitor pattern above. In this instance, the `case` statements are
quite close to function definitions in their entire syntactic
structure (is it really that alien that `Point(x, y)` here does _not_
represent a function call?).
This example also demonstrates something else I have brought up
before: pattern matching does not strictly have the same semantics as
an `if`-`elif`-`else` chain. It is thus not a simple control flow
construct, as the emphasis is less on the control flow itself and more
on the shape of the objects and data.
There are, however, some clear limitations of the visitor pattern as
implement in this way. If we go and define a new subclass of one of
these geometric shapes---perhaps something like `HorizontalLine` as a
specialisation of `Line`---, the visitor pattern will break down
because there is no method `case_HorizontalLine`. Pattern matching,
on the other hand, will still work as expected, because we are
actually working in the space of classes with their full inheritance
structure, and not only on the names of types.
Mind, pattern matching is not a replacement for the visitor pattern in
general, though. Even with pattern matching in place, there are valid
use cases where you would certainly prefer the visitor pattern over
pattern matching. This is only one example, after all, meant to
illustrate the idea behind pattern matching and why it might be a good
idea.
3. Shape and Structure
----------------------
Data is organised and comes in different shapes, structures, and
representations. It is not always as easy as in the cases above,
where the class of an object already contains all the information you
need. As a first example, the visitor pattern as illustrated above
could not easily differentiate between tuples of different lengths,
say: you would have to encode the length of the tuple somehow in the
method names, and then consider what happens if some objects does not
have a length, say, and thus requires a different encoding scheme.
In fact, the classic example of where pattern matching really starts
to shine is a tree structure that we want to simplify (or otherwise
process) according to specific rules. This might be an abstract
syntax tree (AST) or a mathematical expression, but the principle is
always the same: the decision which rule to apply depends on a number
of different attributes, which are often scattered among nodes of
different depths in your tree. The following illustrates this with a
tree that represents mathematical expressions:
```
def simplify(node):
match node:
case BinOp(Num(left), '+', Num(right)):
return Num(left + right)
case BinOp(left, ('+'|'-'), Num(0)):
return left
case UnaryOp('-', UnaryOp('-', x)):
return x
case UnaryOp('-', Num(x)):
return Num(-x)
case anythingElse:
return anythingElse
```
If you draw an actual diagram of the structure that each case imposes
on the node, you will quickly find that pattern matching allows you to
express a larger variety of structure than you could encode in method
names. Not to mention if you had to write everything with `if`s and
calls to `isinstance`, etc. It is possible, of course, but pattern
matching introduces a very succinct way of expressing the structure
declaratively and then let the compiler and interpreter figure out how
to best compare the subject `node` against the different possibilities
and extract the data you are actually interested in.
By the way: in this case, the compiler might end up creating a
structure of nested `if`s, perhaps something like:
```
def simplify(node):
TOS = node
try:
if isinstance(TOS, BinOp) and isinstance(TOS.right, Num):
if TOS.right.n == 0 and TOS.op in ('+','-'):
return TOS.left
elif TOS.op == '+' and isinstance(TOS.left, Num):
return Num(TOS.left.n + TOS.right.n)
elif isinstance(TOS, UnaryOp) and TOS.op == '-':
value = TOS.operand
if isinstance(value, UnaryOp) and value.op == '-':
return value.operand
elif isinstance(value, Num):
return value.n
return TOS:
finally:
del TOS
```
This illustrates, once again, that pattern matching is not so much
specifying the control flow itself, but rather offering a set of
alternatives. But there is neither the strict sequential structure of
`if`-`elif`-`else`, not the jump tables of `switch`. And in case you
wondered: the entire `match` block is actually a mini-scope, in which
the subject `node` is defined.
Kind regards,
Tobias
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/ZAMFDQKKEJONDEQRGC5XKOUAFUTL6OG7/
Code of Conduct: http://python.org/psf/codeofconduct/