Hello,

I have been working on an idea that would introduce pattern matching syntax
to python. I now have this syntax implemented in cpython, and feel this is
the right time to gather further input. The repository and branch can be
found at https://github.com/natelust/cpython/tree/match_syntax. The new
syntax would clean up readability, ease things like visitor pattern style
programming, localize matching behavior to a class, and support better
signaling amongst other things. This is the tl;dr, I will get into a longer
discussion below, but first I want to introduce the syntax and how it works
with the following simple example.

result = some_function_call()

try match result:  #  try match some_function_call(): is also supported

    as Dog:

        print("is a dog")

    as Cat(lives):

        print(f"is a cat with {lives} lives")

    as tuple(result1, result2):

        print(f"got two results {result1} and {result2}")

    else:

        print("unknown result")

The statement begins with a new compound keyword "try match" . This is
treated as one logical block, the word match is not being turned into a
keyword. There are no backwards compatibility issues as previously no
symbols were allowed between try and :.  The try match compound keyword was
chosen to make it clearer to users that this is distinct from a try block,
provide a hint on what the block is doing, and follow the Python tradition
of being sensible when spoken out loud to an English speaker. This keyword
is followed by an expression that is to be matched, called the match target.

What follows is one or more match blocks. A match block is started with the
keyword ‘as’ followed by a type, and optionally parameters.

Matching begins by calling a __match__ (class)method on the type, with the
match target as a parameter. The match method must return an object that
can be evaluated as a bool. If the return value is True, the code block in
this match branch is executed, and execution is passed to whatever comes
after the match syntax. If __match__ returns False, execution is passed to
the next match branch for testing.

If a match branch contains a group  of parameters, they are used in the
matching process as well. If __match__ returns True, then the match target
will be tested for the presence of a __unpack__ method. If there is no such
method, the match target is tried as a sequence. If both of these fail,
execution moves on. If there is a __unpack__ method, it is called and is
expected to return a sequence. The length of the sequence (either the
result of __unpack__, or the match target itself) is compared to the number
of supplied arguments. If they match the sequence is unpacked into
variables defined by the arguments and the match is considered a success
and the body is executed. If the length of the sequence does not match the
number of arguments, the match branch fails and execution continues. This
is useful for differentiating tuples of different lengths, or objects that
unpack differently depending on state.

If all the match blocks are tested and fail, the try match statement will
check for the presence of an else clause. If this is present the body is
executed. This serves as a default execution block.

What is the __match__ method and how does it determine if a match target is
a match? This change introduces __match__ as a new default method on
‘object’. The default implementation first checks the match target with the
‘is’ operator against the object containing the __match__ method. If that
is false, then it checks the match target using isinstnace. Objects are
free to implement whatever __match__ method they want provided it matches
the interface.

This proposal also introduces __unpack__ as a new interface, but does not
define any default implementation. This method should return a sequence.
There is no specific form outside this definition, a class author is free
to implement whatever representation they would like to use in a match
statement. One use case for this method could be a class that stores the
parameters passed to __init__ (or some other parameters) so someone could
construct a new object such as Animal(*animal_instance.__unpack__())
[possibly with a builtin for calling unpack]. Another use, in keeping with
the match syntax, is something like Stateful Enums.

The behavior covered by try match can be emulated with some combination of
compound and or nested if statements alongside type checking and parameter
unpacking, so why introduce the new syntax? The benefits I see are:

* Much easier to read and follow compared complicated if branching logic

* the matching logic is now defined alongside the class in the __match__
method. This is in contrast to if statements where the logic is duplicated
in each place. If it factored out into a function, the function may be
unknown or unused in a cross package setting. Refactoring distributed logic
to live next to an object is similar to the introduction of the format
method on strings.

* It introduces a well defined interface for programmers to depend on in
contrast to new interfaces being built on a package by package base. For
instance __unpack__ and __match__ behavior may be implemented on objects
today with any variety of names, making discoverability difficult, and
making if blocks more difficult for a user to parse.

* A pattern often found in python is to use Exceptions for signaling
conditions inside of execution which muddles the distinction between
handling exceptional behavior and normal code execution. The try match
syntax, along with object unpacking, provides a standardized way to signal
information back to callers and standardized syntax for handling those
signals.

The try match syntax fits in well with the ethos of including more ways to
use typing information within python. For instance something like
typing.Union could have a corresponding __match__ method such that all of
those types would be covariant under a single match block. In the opposite
sense the try match syntax is also useful for taking a parameter defined as
a union and dispatching to individual functions that transform the variable
into a standardized type.

Points I am unsure of:

I am not sure about using “(parameters)” as part of the patch branch. In
some ways it is very familiar to look at, but it also makes it look like
these must be parameters used to construct the object, not parameters
returned from __unpack__ (though they may be the same, they may not be). I
have toyed with the the idea of something like “-> (a,b)” or uisng braces
“{a,b}” and the serve the purpose of being different, but then that also
makes them look different, so I dont really have a strong opinion formed on
this part of the syntax.

The implementation on my branch does work, but I am by no means an expert
on all of python, there may be much better ways to do what I did. In
particular I implemented some of the logic in the compiler, but it may be
better served as an op code. The exact boundary of where best to put some
logic is unclear.

This implementation currently stops at the first matching block it comes
to, not the best match out of all blocks. This is meant to make it easier
to understand the “flow” of the statement, but it might be preferable to
execute the block associated with the best match, though this would
complicate the implementation a good deal.


I am sure there is more that I have not considered with this proposal and I
appreciate any feedback you choose to provide. Thank you for your time in
reading this.

-- 
Nate Lust, PhD.
Astrophysics Dept.
Princeton University
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/G7CXCVJGR54QG5DES54R3P5GQX7COSUI/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to