[Python-Dev] Re: PEP 642: Constraint Pattern Syntax for Structural Pattern Matching

Steven D'Aprano Sat, 31 Oct 2020 04:27:38 -0700

Thank you for the well-written PEP, although I don't agree with it. My 
response below is quite long. Here is my opinionated TL;DR:

(1) Just get over the use of `_` for the wildcard pattern.
another identifier. Now that the parser will support soft keywords, we
should expect more cases that something that is an identifier is one
context will be a keyword in another.

(2) The most common uses of patterns should not require sigils.

(3) None is special, and we should insist on `is` comparisons by
default. True and False are a little more problematic.

(4) Using sigils to over-ride the default is okay. That includes turning
what would otherwise be a capture pattern into a comparison.

Details below.

On Sat, Oct 31, 2020 at 05:16:59PM +1000, Nick Coghlan wrote:

> The rendered version of the PEP can be found here:
> https://www.python.org/dev/peps/pep-0642/

Quoting from the PEP:

"Wildcard patterns change their syntactic marker from _ to ?"

Yuck. Sorry, I find `?` in that role very aesthetically and
visually unappealing :-(

I really don't get why so many people are hung up over this minuscule
issue of giving `_` special meaning inside match statements. IMO,
consistency with other languages' pattern matching is more useful than
the ability to capture using `_` as a variable name.

Now that the PEG parser makes it easy to have soft keywords, there will
probably be more cases in the future where something that is
syntactically an identifier is a regular name in one context and special
syntax in another. This has happened before (e.g. "as") and it will
happen again.

We have a very strong convention that `_` is used as a write-only "don't
care" variable. (The two exceptions are the magic underscore in the
REPL, and `_()` in i18n.) In idiomatic Python code, if we bind a value
to `_` and then use it later, we are Doing It Wrong.

Is there such a shortage of local variable names that the inability to
misuse `_` is a problem in practice? Just use another identifier.

But if we really *must* break that convention and bind to `_`, we can
still do it inside a match statement:

case a:
_ = a
print(_)

The fact that you have to use a temporary variable to break the rules
is, in my opinion, a good thing -- it reminds you that what you are
doing is weird.

Quoting code from the PEP:

```
# Literal patterns
match number:
case ?0:
print("Nothing")
case ?1:
print("Just one")
```

I think this is an example of what Larry Wall talked about when he
discussed the mistakes of Perl's original regex syntax:

"Poor Huffman coding"

https://www.perl.com/pub/2002/06/04/apo5.html/

Wall regrets that many common patterns are longer and harder to write
than rarer patterns.

Why do we need a `?` sigil to match a literal? `case 1` cannot possibly
be interpreted as a capture pattern. It would be wrong to compare it
with `is`. What else could it mean other than equality comparison? The
question mark is pure noise.

So here's a counter suggestion:

(1) Literals still match by equality, because that is what want 99% of
the time. No sigil required.

You mention this in the "Rejected ideas" section, but I reject your
rejection :-)

The PEP rejects this because:

"they have the same syntax sensitivity problem as value patterns do,
where attempting to move the literal pattern out to a local variable for
naming clarity would turn the value checking literal pattern into a name
binding capture pattern"

but that's based on a really simple-minded refactoring. Sure, the naive
user who knows little about pattern matching might try to refactor like
this:

# Before.
match record:
case (42, x): ...

# After.
ANSWER_TO_LIFE = 42
match record:
# It's a Trap!
case (ANSWER_TO_LIFE, x): ...

and I am sympathetic to your desire to avoid that.

But this is the sort of error that:

- only applies in a comparatively unusual circumstances
(naively refactoring a literal in a case statement);

- is easily avoided by automated refactoring tools;

- linters will warn about (assignment to a CONSTANT);

- is easily spotted if you have unit tests;

- is obvious to those with more experience in pattern matching.

So I don't see this is as a large problem. I expect few people will
be bitten by this more than once, if that. I think that your
preventative solution, forcing all literal patterns to require a
sigil, is worse than the problem it is solving.

Bottom line: let's not hamstring pattern matching with poor Hoffman
coding right from day one.

(2) While literals usually compare by equality, the exception is three
special keywords, and one symbol, that compare by identity:

case None | True | False | ... :
# Compares by identity.

I can't think of any other literal where identity tests would be useful
and guaranteed by the language (no relying on implementation-specific
details, such as small int caching or string interning).

So these keywords (plus the ... symbol) match by identity by default,
because that's what we want 99% of the time. (Although, see below for
discussion about the two bools.)

Other special values, like NotImplemented and Ellipsis, aren't keywords,
they are just names, and don't get special treatment.

(3) Overriding the default comparison with an explicit sigil is
allowed:

case ==True:
print("True, or 1, or 1.0, or 1+0j, etc")

case ==None:
print("None, or something weird that equals None")

case is 1943.63:
print("if you see this, the interpreter is caching floats")

I don't think that there will be any ambiguity between the unary "=="
pattern modifier and the real `==` operator. But if I am wrong, then we
can change the spelling:

case ?None:
print("None, or something weird that equals None")

case ?is 1943.63:
print("if you see this, the interpreter is caching floats")

(I don't love the question mark here, but I don't hate it either.)

The important thing here is that the cases with no sigil are the common
operations; the sigil is only needed for the uncommon case.

(4) Patterns which could conceivably be interpreted as assignment
targets default to capture patterns, because that's what is normally
wanted in pattern matching:

case [1, spam, eggs]:
# captures spam and eggs

If you don't want to capture a named value, but just match on it,
override it with an explicit `==` or `is`:

case [1, ==spam, eggs]:
# matches `spam` by equality, captures on eggs

Quoting the PEP:

"nobody litters their if-elif chains with x is True or x is False
expressions, they write x and not x, both of which compare by value, not
identity."

That's incorrect. `if x` doesn't *compare* at all, not by value and not
with equality, it duck-types truthiness:

```
>>> class Demo:
... def __bool__(self):
... return True
... def __eq__(self, other):
... return False
...
>>> x = Demo()
>>> x == True
False
>>> if x: print("truthy")
...
truthy
```

There's a reasonable argument to make that (unless overridden by an
explicit sigil) the `True` and `False` patterns should match by
truthiness, not equality or identity, but I'm not going to make that
argument.

Quote:

"Indeed, PEP 8 explicitly disallows the use if x is True"

This is true, but I think you have to understand the intention there. I
believe the intent is that APIs should not insist on *exactly* the True
or False singletons for boolean flags, but instead accept any truthy or
falsey objects. (Duck typing for the win.)

But if you need to distinguish *exactly* True from an arbitrary truthy
value like "spam and eggs" or 93.78, then identity, not equality, is the
correct way to do it.

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/BTHFWG6MWLHALOD6CHTUFPHAR65YN6BP/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: PEP 642: Constraint Pattern Syntax for Structural Pattern Matching

Reply via email to