Hello, I have been working on an idea that would introduce pattern matching syntax to python. I now have this syntax implemented in cpython, and feel this is the right time to gather further input. The repository and branch can be found at https://github.com/natelust/cpython/tree/match_syntax. The new syntax would clean up readability, ease things like visitor pattern style programming, localize matching behavior to a class, and support better signaling amongst other things. This is the tl;dr, I will get into a longer discussion below, but first I want to introduce the syntax and how it works with the following simple example.
result = some_function_call() try match result: # try match some_function_call(): is also supported as Dog: print("is a dog") as Cat(lives): print(f"is a cat with {lives} lives") as tuple(result1, result2): print(f"got two results {result1} and {result2}") else: print("unknown result") The statement begins with a new compound keyword "try match" . This is treated as one logical block, the word match is not being turned into a keyword. There are no backwards compatibility issues as previously no symbols were allowed between try and :. The try match compound keyword was chosen to make it clearer to users that this is distinct from a try block, provide a hint on what the block is doing, and follow the Python tradition of being sensible when spoken out loud to an English speaker. This keyword is followed by an expression that is to be matched, called the match target. What follows is one or more match blocks. A match block is started with the keyword ‘as’ followed by a type, and optionally parameters. Matching begins by calling a __match__ (class)method on the type, with the match target as a parameter. The match method must return an object that can be evaluated as a bool. If the return value is True, the code block in this match branch is executed, and execution is passed to whatever comes after the match syntax. If __match__ returns False, execution is passed to the next match branch for testing. If a match branch contains a group of parameters, they are used in the matching process as well. If __match__ returns True, then the match target will be tested for the presence of a __unpack__ method. If there is no such method, the match target is tried as a sequence. If both of these fail, execution moves on. If there is a __unpack__ method, it is called and is expected to return a sequence. The length of the sequence (either the result of __unpack__, or the match target itself) is compared to the number of supplied arguments. If they match the sequence is unpacked into variables defined by the arguments and the match is considered a success and the body is executed. If the length of the sequence does not match the number of arguments, the match branch fails and execution continues. This is useful for differentiating tuples of different lengths, or objects that unpack differently depending on state. If all the match blocks are tested and fail, the try match statement will check for the presence of an else clause. If this is present the body is executed. This serves as a default execution block. What is the __match__ method and how does it determine if a match target is a match? This change introduces __match__ as a new default method on ‘object’. The default implementation first checks the match target with the ‘is’ operator against the object containing the __match__ method. If that is false, then it checks the match target using isinstnace. Objects are free to implement whatever __match__ method they want provided it matches the interface. This proposal also introduces __unpack__ as a new interface, but does not define any default implementation. This method should return a sequence. There is no specific form outside this definition, a class author is free to implement whatever representation they would like to use in a match statement. One use case for this method could be a class that stores the parameters passed to __init__ (or some other parameters) so someone could construct a new object such as Animal(*animal_instance.__unpack__()) [possibly with a builtin for calling unpack]. Another use, in keeping with the match syntax, is something like Stateful Enums. The behavior covered by try match can be emulated with some combination of compound and or nested if statements alongside type checking and parameter unpacking, so why introduce the new syntax? The benefits I see are: * Much easier to read and follow compared complicated if branching logic * the matching logic is now defined alongside the class in the __match__ method. This is in contrast to if statements where the logic is duplicated in each place. If it factored out into a function, the function may be unknown or unused in a cross package setting. Refactoring distributed logic to live next to an object is similar to the introduction of the format method on strings. * It introduces a well defined interface for programmers to depend on in contrast to new interfaces being built on a package by package base. For instance __unpack__ and __match__ behavior may be implemented on objects today with any variety of names, making discoverability difficult, and making if blocks more difficult for a user to parse. * A pattern often found in python is to use Exceptions for signaling conditions inside of execution which muddles the distinction between handling exceptional behavior and normal code execution. The try match syntax, along with object unpacking, provides a standardized way to signal information back to callers and standardized syntax for handling those signals. The try match syntax fits in well with the ethos of including more ways to use typing information within python. For instance something like typing.Union could have a corresponding __match__ method such that all of those types would be covariant under a single match block. In the opposite sense the try match syntax is also useful for taking a parameter defined as a union and dispatching to individual functions that transform the variable into a standardized type. Points I am unsure of: I am not sure about using “(parameters)” as part of the patch branch. In some ways it is very familiar to look at, but it also makes it look like these must be parameters used to construct the object, not parameters returned from __unpack__ (though they may be the same, they may not be). I have toyed with the the idea of something like “-> (a,b)” or uisng braces “{a,b}” and the serve the purpose of being different, but then that also makes them look different, so I dont really have a strong opinion formed on this part of the syntax. The implementation on my branch does work, but I am by no means an expert on all of python, there may be much better ways to do what I did. In particular I implemented some of the logic in the compiler, but it may be better served as an op code. The exact boundary of where best to put some logic is unclear. This implementation currently stops at the first matching block it comes to, not the best match out of all blocks. This is meant to make it easier to understand the “flow” of the statement, but it might be preferable to execute the block associated with the best match, though this would complicate the implementation a good deal. I am sure there is more that I have not considered with this proposal and I appreciate any feedback you choose to provide. Thank you for your time in reading this. -- Nate Lust, PhD. Astrophysics Dept. Princeton University
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/G7CXCVJGR54QG5DES54R3P5GQX7COSUI/ Code of Conduct: http://python.org/psf/codeofconduct/